Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.
You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.
CustomerId: Unique ID which is assigned to each customer
Surname: Last name of the customer
CreditScore: It defines the credit history of the customer.
Geography: A customer’s location
Gender: It defines the Gender of the customer
Age: Age of the customer
Tenure: Number of years for which the customer has been with the bank
NumOfProducts: refers to the number of products that a customer has purchased through the bank.
Balance: Account balance
HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.
EstimatedSalary: Estimated salary
IsActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )
Exited : whether or not the customer left the bank within six month. It can take two values
This is a commented Jupyter IPython Notebook file in which all the instructions and tasks to be performed are mentioned.
# Installing the libraries with the specified version.
!pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==2.0.3 imbalanced-learn==0.10.1 -q --user
Note: After running the above cell, please restart the notebook kernel/runtime (depending on whether you're using Jupyter Notebook or Google Colab) and then sequentially run all cells from the one below.
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Library to split data
from sklearn.model_selection import train_test_split
# library to import to standardize the data
from sklearn.preprocessing import StandardScaler, LabelEncoder
# importing different functions to build models
import tensorflow as tf
from tensorflow import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense, Dropout
# importing SMOTE
from imblearn.over_sampling import SMOTE
# importing metrics
from sklearn.metrics import confusion_matrix,roc_curve,classification_report,recall_score
import random
# Library to avoid the warnings
import warnings
warnings.filterwarnings("ignore")
# Uncomment and run the following lines in case Colab is being used
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
ds = pd.read_csv('/content/drive/MyDrive/Colab Datafiles/Churn.csv')# complete the code to load the dataset
ds1 = ds.copy()
# let's view the first 5 rows of the data
ds.head() ## Complete the code to view top 5 rows of the data
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 15634602 | Hargrave | 619 | France | Female | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 | 1 |
| 1 | 2 | 15647311 | Hill | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
| 2 | 3 | 15619304 | Onio | 502 | France | Female | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 | 1 |
| 3 | 4 | 15701354 | Boni | 699 | France | Female | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 | 0 |
| 4 | 5 | 15737888 | Mitchell | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 | 0 |
# let's view the last 5 rows of the data
ds.tail() ## Complete the code to view last 5 rows of the data
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9995 | 9996 | 15606229 | Obijiaku | 771 | France | Male | 39 | 5 | 0.00 | 2 | 1 | 0 | 96270.64 | 0 |
| 9996 | 9997 | 15569892 | Johnstone | 516 | France | Male | 35 | 10 | 57369.61 | 1 | 1 | 1 | 101699.77 | 0 |
| 9997 | 9998 | 15584532 | Liu | 709 | France | Female | 36 | 7 | 0.00 | 1 | 0 | 1 | 42085.58 | 1 |
| 9998 | 9999 | 15682355 | Sabbatini | 772 | Germany | Male | 42 | 3 | 75075.31 | 2 | 1 | 0 | 92888.52 | 1 |
| 9999 | 10000 | 15628319 | Walker | 792 | France | Female | 28 | 4 | 130142.79 | 1 | 1 | 0 | 38190.78 | 0 |
# Checking the number of rows and columns in the training data
ds.shape## Complete the code to view dimensions of the train data
(10000, 14)
ds.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 RowNumber 10000 non-null int64 1 CustomerId 10000 non-null int64 2 Surname 10000 non-null object 3 CreditScore 10000 non-null int64 4 Geography 10000 non-null object 5 Gender 10000 non-null object 6 Age 10000 non-null int64 7 Tenure 10000 non-null int64 8 Balance 10000 non-null float64 9 NumOfProducts 10000 non-null int64 10 HasCrCard 10000 non-null int64 11 IsActiveMember 10000 non-null int64 12 EstimatedSalary 10000 non-null float64 13 Exited 10000 non-null int64 dtypes: float64(2), int64(9), object(3) memory usage: 1.1+ MB
ds.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| RowNumber | 10000.0 | 5.000500e+03 | 2886.895680 | 1.00 | 2500.75 | 5.000500e+03 | 7.500250e+03 | 10000.00 |
| CustomerId | 10000.0 | 1.569094e+07 | 71936.186123 | 15565701.00 | 15628528.25 | 1.569074e+07 | 1.575323e+07 | 15815690.00 |
| CreditScore | 10000.0 | 6.505288e+02 | 96.653299 | 350.00 | 584.00 | 6.520000e+02 | 7.180000e+02 | 850.00 |
| Age | 10000.0 | 3.892180e+01 | 10.487806 | 18.00 | 32.00 | 3.700000e+01 | 4.400000e+01 | 92.00 |
| Tenure | 10000.0 | 5.012800e+00 | 2.892174 | 0.00 | 3.00 | 5.000000e+00 | 7.000000e+00 | 10.00 |
| Balance | 10000.0 | 7.648589e+04 | 62397.405202 | 0.00 | 0.00 | 9.719854e+04 | 1.276442e+05 | 250898.09 |
| NumOfProducts | 10000.0 | 1.530200e+00 | 0.581654 | 1.00 | 1.00 | 1.000000e+00 | 2.000000e+00 | 4.00 |
| HasCrCard | 10000.0 | 7.055000e-01 | 0.455840 | 0.00 | 0.00 | 1.000000e+00 | 1.000000e+00 | 1.00 |
| IsActiveMember | 10000.0 | 5.151000e-01 | 0.499797 | 0.00 | 0.00 | 1.000000e+00 | 1.000000e+00 | 1.00 |
| EstimatedSalary | 10000.0 | 1.000902e+05 | 57510.492818 | 11.58 | 51002.11 | 1.001939e+05 | 1.493882e+05 | 199992.48 |
| Exited | 10000.0 | 2.037000e-01 | 0.402769 | 0.00 | 0.00 | 0.000000e+00 | 0.000000e+00 | 1.00 |
# let's check for missing values in the data
ds.isnull() ## Complete the code to check missing entries in the train data
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 1 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 2 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 3 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 4 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 9996 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 9997 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 9998 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 9999 | False | False | False | False | False | False | False | False | False | False | False | False | False | False |
10000 rows × 14 columns
ds.isnull().sum()
RowNumber 0 CustomerId 0 Surname 0 CreditScore 0 Geography 0 Gender 0 Age 0 Tenure 0 Balance 0 NumOfProducts 0 HasCrCard 0 IsActiveMember 0 EstimatedSalary 0 Exited 0 dtype: int64
# Check for missing values
missing_values = ds.isnull().sum()
# Check for duplicate rows
duplicate_rows = ds.duplicated().sum()
# Check for unknown values (e.g., placeholders like '?', 'NA', 'unknown')
unknown_values = (ds == '?').sum() + (ds == 'NA').sum() + (ds == 'unknown').sum()
# Display the results
print("Missing values in each column:\n", missing_values)
print("\nNumber of duplicate rows: ", duplicate_rows)
print("\nUnknown values in each column:\n", unknown_values)
Missing values in each column: RowNumber 0 CustomerId 0 Surname 0 CreditScore 0 Geography 0 Gender 0 Age 0 Tenure 0 Balance 0 NumOfProducts 0 HasCrCard 0 IsActiveMember 0 EstimatedSalary 0 Exited 0 dtype: int64 Number of duplicate rows: 0 Unknown values in each column: RowNumber 0 CustomerId 0 Surname 0 CreditScore 0 Geography 0 Gender 0 Age 0 Tenure 0 Balance 0 NumOfProducts 0 HasCrCard 0 IsActiveMember 0 EstimatedSalary 0 Exited 0 dtype: int64
ds.nunique()
RowNumber 10000 CustomerId 10000 Surname 2932 CreditScore 460 Geography 3 Gender 2 Age 70 Tenure 11 Balance 6382 NumOfProducts 4 HasCrCard 2 IsActiveMember 2 EstimatedSalary 9999 Exited 2 dtype: int64
#RowNumber , CustomerId and Surname are unique hence dropping it
ds = ds.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
import matplotlib.pyplot as plt
import seaborn as sns
# Calculate summary statistics for numerical variables
numerical_vars = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']
summary_stats = ds[numerical_vars].describe()
summary_stats
| CreditScore | Age | Tenure | Balance | NumOfProducts | EstimatedSalary | |
|---|---|---|---|---|---|---|
| count | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |
| mean | 650.528800 | 38.921800 | 5.012800 | 76485.889288 | 1.530200 | 100090.239881 |
| std | 96.653299 | 10.487806 | 2.892174 | 62397.405202 | 0.581654 | 57510.492818 |
| min | 350.000000 | 18.000000 | 0.000000 | 0.000000 | 1.000000 | 11.580000 |
| 25% | 584.000000 | 32.000000 | 3.000000 | 0.000000 | 1.000000 | 51002.110000 |
| 50% | 652.000000 | 37.000000 | 5.000000 | 97198.540000 | 1.000000 | 100193.915000 |
| 75% | 718.000000 | 44.000000 | 7.000000 | 127644.240000 | 2.000000 | 149388.247500 |
| max | 850.000000 | 92.000000 | 10.000000 | 250898.090000 | 4.000000 | 199992.480000 |
Observations
Feature Scaling:
Since features like CreditScore, Age, Balance, and EstimatedSalary have different ranges and units, it is essential to standardize or normalize these features to ensure the model performs well.
Handling Zero Balances:
The presence of a significant number of customers with a balance of 0 should be explored further. These could be a separate class or might need special handling during model training.
Tenure and Product Usage:
Tenure and NumOfProducts indicate customer loyalty and engagement. These features can be crucial for predicting churn and should be carefully analyzed for their impact on the model.
Age Diversity:
The wide age range suggests that different age groups may have different behaviors. Age might interact with other features like Balance and NumOfProducts in interesting ways.
Outliers:
High values in Balance and EstimatedSalary may act as outliers. It's important to check if these outliers disproportionately affect the model and consider techniques like log transformation if needed.
Next Steps
Data Preprocessing:
Normalize/standardize the numerical features. Encode categorical variables like Geography and Gender.
Feature Engineering:
Investigate zero balances and consider creating new features that capture the interaction between Balance and NumOfProducts.
Model Training:
Use the preprocessed data to train a neural network model, ensuring that features are scaled and properly encoded.
# Histograms
ds[numerical_vars].hist(bins=30, figsize=(15, 10))
plt.suptitle('Histograms of Numerical Variables')
plt.show()
# Box plots
plt.figure(figsize=(15, 10))
for i, var in enumerate(numerical_vars):
plt.subplot(2, 3, i + 1)
sns.boxplot(y=ds[var])
plt.title(f'Box plot of {var}')
plt.tight_layout()
plt.show()
Observations and Key Takeaways from the Histograms
1. CreditScore: • Distribution: The credit scores are approximately normally distributed with a peak around 650-700. • Skewness: There is a slight left skew with more customers having higher credit scores closer to the maximum value of 850. • Implication: Credit score is a crucial variable and its normal distribution suggests that the dataset is balanced in terms of creditworthiness. Standardizing this variable will help in modeling.
2. Age: • Distribution: Age shows a right-skewed distribution with a significant number of customers in the 30-40 age range. • Implication: The concentration of customers in a specific age group (30-40) indicates that age could be a significant factor in customer behavior and churn prediction. It might be beneficial to create age groups or bins to better capture trends.
3. Tenure: • Distribution: Tenure appears to be uniformly distributed with a slight decrease at the higher end (10 years). • Implication: Since tenure is spread across all values with no specific trend, it might be used as is or binned to understand its impact on churn better.
4. Balance: • Distribution: Balance has a large number of customers with a zero balance, while the rest of the distribution shows a normal-like spread with a peak around 100,000. • Implication: The zero-balance customers might need special consideration as they could represent inactive or low-value customers. This feature will likely be crucial for identifying churn patterns.
5. NumOfProducts: • Distribution: Most customers have either 1 or 2 products, with very few having 3 or 4 products. • Implication: The small number of customers with 3 or more products indicates that those with more products might have different behavior patterns. This can be a significant predictor of churn, as customers with fewer products might be more likely to leave.
Key Takeaways for Model Building
# Frequency counts for categorical variables
categorical_vars = ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember', 'Exited']
frequency_counts = {var: ds[var].value_counts() for var in categorical_vars}
frequency_counts
{'Geography': Geography
France 5014
Germany 2509
Spain 2477
Name: count, dtype: int64,
'Gender': Gender
Male 5457
Female 4543
Name: count, dtype: int64,
'HasCrCard': HasCrCard
1 7055
0 2945
Name: count, dtype: int64,
'IsActiveMember': IsActiveMember
1 5151
0 4849
Name: count, dtype: int64,
'Exited': Exited
0 7963
1 2037
Name: count, dtype: int64}
# Bar plots
plt.figure(figsize=(15, 10))
for i, var in enumerate(categorical_vars):
plt.subplot(3, 2, i + 1)
sns.countplot(x=ds[var])
plt.title(f'Bar plot of {var}')
plt.tight_layout()
plt.show()
1. Geography: • Distribution: • France has the highest number of customers, followed by Spain and Germany. • Implication: • This geographic distribution should be taken into account, as customers from different regions may have different behaviors and churn rates. It might be useful to include interactions between Geography and other features in the model.
2. Gender: • Distribution: • The number of male and female customers is almost equal. • Implication: • Gender balance suggests that any gender-based differences in churn can be observed and modeled. It is important to include gender as a feature to see if it influences churn.
3. HasCrCard: • Distribution: • A significant majority of customers have a credit card. • Implication: • The possession of a credit card might be a factor in customer engagement and satisfaction. Analyzing the impact of having a credit card on churn can provide valuable insights.
4. IsActiveMember: • Distribution: • The distribution is nearly even between active and non-active members. • Implication: • Whether a customer is an active member could be a crucial factor in predicting churn. Active members might be less likely to churn compared to inactive ones.
5. Exited: • Distribution: • There is a clear imbalance, with many more customers not exiting (churning) compared to those who did exit. • Implication: • The imbalance in the target variable (Exited) suggests that class imbalance techniques (e.g., SMOTE, undersampling) might be necessary to improve model performance. It’s important to address this imbalance to ensure the model accurately predicts both classes.
Summary for Model Building
1. Geographical Insights: • Incorporate the geographic distribution into feature engineering. Consider region-specific models or adding interaction terms between Geography and other features.
2. Gender Analysis: • Include gender as a feature and explore its interaction with other variables to see if there are significant patterns related to churn.
3. Credit Card Ownership: • Analyze how having a credit card impacts churn. This could be an important feature, possibly indicating higher engagement or dependency on bank services.
4. Customer Activity: • The status of being an active member should be prominently featured in the model. It is a strong indicator of customer engagement and likely retention.
5. Class Imbalance: • Apply techniques to address class imbalance in the target variable to ensure that the model does not become biased towards the majority class (non-churn).
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
histogram_boxplot(ds,'CreditScore')
histogram_boxplot(ds,'Age') ## Complete the code to create histogram_boxplot for Age
histogram_boxplot(ds,'Balance') ## Complete the code to create histogram_boxplot for Balance
histogram_boxplot(ds,'EstimatedSalary') ## Complete the code to create histogram_boxplot for Estimated Salary
labeled_barplot(ds, "Exited", perc=True)
labeled_barplot(ds, "Geography", perc=True) ## Complete the code to create labeled_barplot for Geography
labeled_barplot(ds, "Gender", perc=True) ## Complete the code to create labeled_barplot for Gender
labeled_barplot(ds, "Tenure", perc=True) ## Complete the code to create labeled_barplot for Tenure
labeled_barplot(ds, "NumOfProducts", perc=True) ## Complete the code to create labeled_barplot for Number of products
labeled_barplot(ds, "HasCrCard", perc=True) ## Complete the code to create labeled_barplot for Has credit card
labeled_barplot(ds, "IsActiveMember", perc=True) ## Complete the code to create labeled_barplot for Is active member
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
plt.legend(
loc="lower left",
frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
# defining the list of numerical columns
cols_list = ["CreditScore","Age","Tenure","Balance","EstimatedSalary"]
plt.figure(figsize=(15, 7))
sns.heatmap(ds[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
Key Takeaways from the Correlation Matrix
Correlation Values The correlation matrix provides the correlation coefficients between pairs of features in the dataset. These coefficients range from -1 to 1, indicating the strength and direction of the linear relationship between the variables. Here are the key observations:
import seaborn as sns
import matplotlib.pyplot as plt
# Define the columns to analyze against 'Exited'
columns_to_analyze = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary', 'Geography', 'Gender', 'HasCrCard', 'IsActiveMember']
# Plot bivariate analysis for numerical features
for col in ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']:
plt.figure(figsize=(10, 5))
sns.boxplot(x='Exited', y=col, data=ds)
plt.title(f'Bivariate Analysis of {col} vs Exited')
plt.show()
# Plot bivariate analysis for categorical features
for col in ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']:
plt.figure(figsize=(10, 5))
sns.countplot(x=col, hue='Exited', data=ds)
plt.title(f'Bivariate Analysis of {col} vs Exited')
plt.show()
Observations from Bivariate Analysis
Age vs Exited
• Distribution:
• Customers who exited (churned) tend to be older compared to those who did not exit. • Churn Behavior:
• Older customers have a higher propensity to leave the bank.
Key Takeaways
• Age Factor:
• Age is a significant factor in customer churn. Older customers might have different needs or face different issues that lead to higher churn rates.
• Churn Risk:
• Implement strategies to address the specific needs of older customers to reduce their churn rate.
Next Steps
• Feature Importance:
• Include Age as an important feature in the churn prediction model.
• Targeted Strategies:
• Design retention strategies focused on addressing the needs of older customers.
• Further Analysis:
• Investigate interactions between Age and other features to understand specific segments at higher risk.
Tenure vs Exited • Distribution: • The tenure distribution is similar for both exited and non-exited customers, with a slight tendency for customers with higher tenure to churn. • Churn Behavior: • Tenure alone might not be a strong predictor of churn. Key Takeaways • Tenure Factor: • Tenure should be considered in conjunction with other factors for a better understanding of churn. • Churn Risk: • Customers with longer tenure might have accumulated grievances that lead to churn. Next Steps • Feature Importance: • Include Tenure in the model to explore its interaction with other variables. • Targeted Strategies: • Implement strategies to address long-term customer grievances. • Further Analysis: • Analyze the relationship between Tenure and customer satisfaction.
Balance vs Exited • Distribution: • Customers who exited have a slightly higher account balance compared to those who did not exit. • Churn Behavior: • High-balance customers who exit might indicate dissatisfaction despite having significant funds with the bank. Key Takeaways • Balance Factor: • Balance is an important feature to consider. High-balance customers should be closely monitored for churn risk. • Churn Risk: • High-balance customers may require special attention to understand and address their reasons for leaving. Next Steps • Feature Importance: • Include Balance as a key feature in the churn prediction model. • Targeted Strategies: • Design retention efforts focused on high-balance customers. • Further Analysis: • Investigate reasons for churn among high-balance customers.
NumOfProducts vs Exited • Distribution: • The number of products held by customers is similar for both exited and non-exited customers, with a slight increase in the number of products for customers who did not exit. • Churn Behavior: • The number of products might not show a strong direct correlation with churn. Key Takeaways • Product Engagement: • The number of products could still be relevant in combination with other features. • Churn Risk: • Customers with fewer products might be more likely to churn. Next Steps • Feature Importance: • Include NumOfProducts in the model to explore its combined effect with other features. • Targeted Strategies: • Encourage customers to adopt more products to increase engagement. • Further Analysis: • Analyze product combinations and their impact on churn.
EstimatedSalary vs Exited • Distribution: • Estimated salary shows no significant difference between exited and non-exited customers. • Churn Behavior: • Salary might not be a strong predictor of churn. Key Takeaways • Salary Factor: • Estimated salary may not be a significant standalone predictor. • Churn Risk: • Consider salary in combination with other features. Next Steps • Feature Importance: • Include EstimatedSalary in the model to explore potential interactions. • Targeted Strategies: • Focus on other more significant factors for retention strategies. • Further Analysis: • Investigate potential combined effects with other features.
Geography vs Exited • Distribution: • Churn rates vary significantly by geography, with higher churn rates observed in Germany compared to France and Spain. • Churn Behavior: • Geography is a critical factor, with regional differences impacting churn rates. Key Takeaways • Geographic Factor: • Geography significantly influences churn rates. • Churn Risk: • Tailor retention strategies to address region-specific issues. Next Steps • Feature Importance: • Include Geography as a key feature in the churn prediction model. • Targeted Strategies: • Design region-specific retention strategies. • Further Analysis: • Explore regional trends and their impact on customer behavior.
Gender vs Exited • Distribution: • Females have a higher churn rate compared to males. • Churn Behavior: • Gender differences suggest varying needs and issues leading to churn. Key Takeaways • Gender Factor: • Gender is an important predictor of churn. • Churn Risk: • Implement gender-specific retention strategies. Next Steps • Feature Importance: • Include Gender as a significant feature in the churn prediction model. • Targeted Strategies: • Address specific needs and issues of female customers. • Further Analysis: • Investigate further into gender-specific behavior patterns.
HasCrCard vs Exited • Distribution: • Customers without a credit card have a higher proportion of churn compared to those with a credit card. • Churn Behavior: • Credit card ownership is associated with lower churn rates. Key Takeaways • Credit Card Ownership: • Having a credit card appears to be associated with a lower likelihood of churn. • Churn Risk: • Customers without credit cards might be at higher risk of churn. Next Steps • Feature Importance: • Include HasCrCard as an important feature in the churn prediction model. • Targeted Strategies: • Design targeted retention strategies for customers without credit cards. • Further Analysis: • Investigate other features in combination with HasCrCard to understand specific segments at higher risk of churn. By considering these key insights, you can build a more accurate and effective model for predicting customer churn and design targeted interventions to reduce churn rates. If you need further assistance or more detailed analysis, feel free to ask!
stacked_barplot(ds, "Geography", "Exited" )
Exited 0 1 All Geography All 7963 2037 10000 Germany 1695 814 2509 France 4204 810 5014 Spain 2064 413 2477 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(ds, "Gender", "Exited") ## Complete the code to plot stacked barplot for Exited and Gender
Exited 0 1 All Gender All 7963 2037 10000 Female 3404 1139 4543 Male 4559 898 5457 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(ds, "HasCrCard", "Exited") ## Complete the code to plot stacked barplot for Exited and Has credit card
Exited 0 1 All HasCrCard All 7963 2037 10000 1 5631 1424 7055 0 2332 613 2945 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(ds, "IsActiveMember", "Exited") ## Complete the code to plot stacked barplot for Exited and Is active member
Exited 0 1 All IsActiveMember All 7963 2037 10000 0 3547 1302 4849 1 4416 735 5151 ------------------------------------------------------------------------------------------------------------------------
plt.figure(figsize=(5,5))
sns.boxplot(y='CreditScore',x='Exited',data=ds)
plt.show()
plt.figure(figsize=(5,5))
sns.boxplot(y='Age',x='Exited',data=ds) ## Complete the code to plot the boxplot for Exited and Age
plt.show()
plt.figure(figsize=(5,5))
sns.boxplot(y='Tenure',x='Exited',data=ds) ## Complete the code to plot the boxplot for Exited and Tenure
plt.show()
plt.figure(figsize=(5,5))
sns.boxplot(y='Balance',x='Exited',data=ds) ## Complete the code to plot the boxplot for Exited and Balance
plt.show()
plt.figure(figsize=(5,5))
sns.boxplot(y='NumOfProducts',x='Exited',data=ds) ## Complete the code to plot the boxplot for Exited and Number of products
plt.show()
plt.figure(figsize=(5,5))
sns.boxplot(y='EstimatedSalary',x='Exited',data=ds) ## Complete the code to plot the boxplot for Exited and Estimated Salary
plt.show()
ds = pd.get_dummies(ds,columns=ds.select_dtypes(include=["object"]).columns.tolist(),drop_first=True,dtype=float)
X = ds.drop(['Exited'],axis=1) # Credit Score through Estimated Salary
y = ds['Exited'] # Exited
# Splitting the dataset into the Training and Testing set.
X_large, X_test, y_large, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42,stratify=y,shuffle = True) ## Complete the code to Split the X and y and obtain test set
# Splitting the dataset into the Training and Testing set.
X_train, X_val, y_train, y_val = train_test_split(X_large, y_large, test_size = 0.2, random_state = 42,stratify=y_large, shuffle = True) ## complete the code to Split X_large and y_large to obtain train and validation sets
print(X_train.shape, X_val.shape, X_test.shape)
(6400, 11) (1600, 11) (2000, 11)
print(y_train.shape, y_val.shape, y_test.shape)
(6400,) (1600,) (2000,)
Since all the numerical values are on a different scale, so we will be scaling all the numerical values to bring them to the same scale.
# creating an instance of the standard scaler
sc = StandardScaler()
X_train[cols_list] = sc.fit_transform(X_train[cols_list])
X_val[cols_list] = sc.transform(X_val[cols_list]) ## Complete the code to specify the columns to normalize
X_test[cols_list] = sc.transform(X_test[cols_list]) ## Complete the code to specify the columns to normalize
Write down the logic for choosing the metric that would be the best metric for this business scenario.
-Recommendation Given the business scenario where the bank wants to prevent customer churn:
Recall should be a priority metric to ensure most potential churners are identified. F1 Score is also very useful as it balances the need to identify churners (recall) and the need to avoid false positives (precision).
Let's create a function for plotting the confusion matrix
def make_confusion_matrix(actual_targets, predicted_targets):
"""
To plot the confusion_matrix with percentages
actual_targets: actual target (dependent) variable values
predicted_targets: predicted target (dependent) variable values
"""
cm = confusion_matrix(actual_targets, predicted_targets)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(cm.shape[0], cm.shape[1])
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
Let's create two blank dataframes that will store the recall values for all the models we build.
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
#Initializing the neural network
model_0 = Sequential()
# Adding the input layer with 64 neurons and relu as activation function
model_0.add(Dense(64, activation='relu', input_dim = X_train.shape[1]))
# Complete the code to add a hidden layer (specify the # of neurons and the activation function)
model_0.add(Dense(32, activation='relu'))
# Complete the code to add the output layer with the number of neurons required.
model_0.add(Dense(1, activation='sigmoid'))
#Complete the code to use SGD as the optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
## Complete the code to compile the model with binary cross entropy as loss function and recall as the metric.
model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_0.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 768
dense_1 (Dense) (None, 32) 2080
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 2881 (11.25 KB)
Trainable params: 2881 (11.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN
history_0 = model_0.fit(
X_train, y_train,
batch_size=32, ## Complete the code to specify the batch size to use
validation_data=(X_val,y_val),
epochs=50, ## Complete the code to specify the number of epochs
verbose=1
)
Epoch 1/50 200/200 [==============================] - 1s 4ms/step - loss: 0.6131 - recall: 0.0729 - val_loss: 0.5794 - val_recall: 0.0000e+00 Epoch 2/50 200/200 [==============================] - 1s 7ms/step - loss: 0.5594 - recall: 0.0023 - val_loss: 0.5425 - val_recall: 0.0000e+00 Epoch 3/50 200/200 [==============================] - 3s 13ms/step - loss: 0.5310 - recall: 0.0000e+00 - val_loss: 0.5224 - val_recall: 0.0000e+00 Epoch 4/50 200/200 [==============================] - 3s 13ms/step - loss: 0.5150 - recall: 0.0000e+00 - val_loss: 0.5106 - val_recall: 0.0000e+00 Epoch 5/50 200/200 [==============================] - 1s 6ms/step - loss: 0.5052 - recall: 0.0000e+00 - val_loss: 0.5029 - val_recall: 0.0000e+00 Epoch 6/50 200/200 [==============================] - 2s 8ms/step - loss: 0.4984 - recall: 0.0000e+00 - val_loss: 0.4974 - val_recall: 0.0000e+00 Epoch 7/50 200/200 [==============================] - 1s 6ms/step - loss: 0.4933 - recall: 0.0000e+00 - val_loss: 0.4931 - val_recall: 0.0000e+00 Epoch 8/50 200/200 [==============================] - 1s 6ms/step - loss: 0.4891 - recall: 0.0000e+00 - val_loss: 0.4894 - val_recall: 0.0000e+00 Epoch 9/50 200/200 [==============================] - 1s 7ms/step - loss: 0.4855 - recall: 0.0000e+00 - val_loss: 0.4861 - val_recall: 0.0000e+00 Epoch 10/50 200/200 [==============================] - 1s 7ms/step - loss: 0.4822 - recall: 0.0000e+00 - val_loss: 0.4831 - val_recall: 0.0000e+00 Epoch 11/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4791 - recall: 0.0000e+00 - val_loss: 0.4804 - val_recall: 0.0000e+00 Epoch 12/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4763 - recall: 0.0000e+00 - val_loss: 0.4778 - val_recall: 0.0000e+00 Epoch 13/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4736 - recall: 0.0000e+00 - val_loss: 0.4754 - val_recall: 0.0000e+00 Epoch 14/50 200/200 [==============================] - 1s 5ms/step - loss: 0.4711 - recall: 7.6687e-04 - val_loss: 0.4731 - val_recall: 0.0000e+00 Epoch 15/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4687 - recall: 7.6687e-04 - val_loss: 0.4709 - val_recall: 0.0000e+00 Epoch 16/50 200/200 [==============================] - 1s 5ms/step - loss: 0.4664 - recall: 7.6687e-04 - val_loss: 0.4689 - val_recall: 0.0000e+00 Epoch 17/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4643 - recall: 0.0015 - val_loss: 0.4670 - val_recall: 0.0000e+00 Epoch 18/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4622 - recall: 0.0023 - val_loss: 0.4652 - val_recall: 0.0031 Epoch 19/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4603 - recall: 0.0023 - val_loss: 0.4635 - val_recall: 0.0031 Epoch 20/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4585 - recall: 0.0046 - val_loss: 0.4618 - val_recall: 0.0031 Epoch 21/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4567 - recall: 0.0054 - val_loss: 0.4603 - val_recall: 0.0031 Epoch 22/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4550 - recall: 0.0061 - val_loss: 0.4588 - val_recall: 0.0061 Epoch 23/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4534 - recall: 0.0069 - val_loss: 0.4574 - val_recall: 0.0061 Epoch 24/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4519 - recall: 0.0100 - val_loss: 0.4561 - val_recall: 0.0092 Epoch 25/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4504 - recall: 0.0138 - val_loss: 0.4548 - val_recall: 0.0092 Epoch 26/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4490 - recall: 0.0176 - val_loss: 0.4536 - val_recall: 0.0123 Epoch 27/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4477 - recall: 0.0215 - val_loss: 0.4525 - val_recall: 0.0184 Epoch 28/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4464 - recall: 0.0268 - val_loss: 0.4514 - val_recall: 0.0245 Epoch 29/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4452 - recall: 0.0299 - val_loss: 0.4504 - val_recall: 0.0307 Epoch 30/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4440 - recall: 0.0353 - val_loss: 0.4494 - val_recall: 0.0368 Epoch 31/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4429 - recall: 0.0383 - val_loss: 0.4485 - val_recall: 0.0399 Epoch 32/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4418 - recall: 0.0429 - val_loss: 0.4476 - val_recall: 0.0429 Epoch 33/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4408 - recall: 0.0452 - val_loss: 0.4468 - val_recall: 0.0429 Epoch 34/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4398 - recall: 0.0491 - val_loss: 0.4460 - val_recall: 0.0460 Epoch 35/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4389 - recall: 0.0537 - val_loss: 0.4452 - val_recall: 0.0460 Epoch 36/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4380 - recall: 0.0583 - val_loss: 0.4445 - val_recall: 0.0521 Epoch 37/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4372 - recall: 0.0667 - val_loss: 0.4438 - val_recall: 0.0521 Epoch 38/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4363 - recall: 0.0713 - val_loss: 0.4432 - val_recall: 0.0613 Epoch 39/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4355 - recall: 0.0759 - val_loss: 0.4426 - val_recall: 0.0613 Epoch 40/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4348 - recall: 0.0874 - val_loss: 0.4420 - val_recall: 0.0613 Epoch 41/50 200/200 [==============================] - 1s 4ms/step - loss: 0.4341 - recall: 0.0890 - val_loss: 0.4415 - val_recall: 0.0644 Epoch 42/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4334 - recall: 0.0928 - val_loss: 0.4410 - val_recall: 0.0675 Epoch 43/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4327 - recall: 0.0982 - val_loss: 0.4405 - val_recall: 0.0675 Epoch 44/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4321 - recall: 0.1035 - val_loss: 0.4401 - val_recall: 0.0736 Epoch 45/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4315 - recall: 0.1058 - val_loss: 0.4396 - val_recall: 0.0736 Epoch 46/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4309 - recall: 0.1089 - val_loss: 0.4392 - val_recall: 0.0767 Epoch 47/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4303 - recall: 0.1158 - val_loss: 0.4388 - val_recall: 0.0859 Epoch 48/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4298 - recall: 0.1196 - val_loss: 0.4384 - val_recall: 0.0859 Epoch 49/50 200/200 [==============================] - 1s 3ms/step - loss: 0.4292 - recall: 0.1212 - val_loss: 0.4381 - val_recall: 0.0890 Epoch 50/50 200/200 [==============================] - 0s 2ms/step - loss: 0.4287 - recall: 0.1273 - val_loss: 0.4377 - val_recall: 0.0920
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_0.history['loss'])
plt.plot(history_0.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_0.history['recall'])
plt.plot(history_0.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Predicting the results using best as a threshold
y_train_pred = model_0.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
array([[False],
[False],
[False],
...,
[False],
[False],
[False]])
#Predicting the results using best as a threshold
y_val_pred = model_0.predict(X_val) ## Complete the code to make prediction on the validation set
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
array([[False],
[False],
[False],
...,
[False],
[False],
[False]])
model_name = "NN with SGD"
train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)
Classification report
#lassification report
cr = classification_report(y_train, y_train_pred)
print("Classification Report for NN with SGD as optimizer on training set")
print(cr)
Classification Report for NN with SGD as optimizer
precision recall f1-score support
0 0.82 0.98 0.89 5096
1 0.65 0.13 0.21 1304
accuracy 0.81 6400
macro avg 0.73 0.56 0.55 6400
weighted avg 0.78 0.81 0.75 6400
Classification Report for NN with SGD as optimizer precision recall f1-score support
0 0.82 0.98 0.89 5096
1 0.65 0.13 0.21 1304
accuracy 0.81 6400
macro avg 0.73 0.56 0.55 6400 weighted avg 0.78 0.81 0.75 6400
#classification report
cr=classification_report(y_val, y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification Report for NN with SGD as optimizer on validation set")
print(cr)
Classification Report for NN with SGD as optimizer on validation set
precision recall f1-score support
0 0.81 0.98 0.89 1274
1 0.59 0.09 0.16 326
accuracy 0.80 1600
macro avg 0.70 0.54 0.52 1600
weighted avg 0.76 0.80 0.74 1600
Classification Report for NN with SGD as optimizer on validation set precision recall f1-score support
0 0.81 0.98 0.89 1274
1 0.59 0.09 0.16 326
accuracy 0.80 1600
macro avg 0.70 0.54 0.52 1600 weighted avg 0.76 0.80 0.74 1600
Confusion matrix
make_confusion_matrix(y_train, y_train_pred)
make_confusion_matrix(y_val, y_val_pred) ## Complete the code to check the model's performance on the validation set
Observations for Neural Network with SGD as Optimizer Training and Validation Loss:
The training and validation loss both decrease steadily over the 50 epochs, indicating that the model is learning and improving its predictions. The training loss is slightly lower than the validation loss, but the gap is not significant, suggesting that the model is not overfitting. Training and Validation Recall:
The recall starts very low for both training and validation sets but gradually improves. By the end of the training, the recall on the validation set is around 0.092, indicating that the model is not very effective at identifying the positive class (churners). Confusion Matrix:
For the training set, the model has a high true negative rate (correctly identifying non-churners) but a low true positive rate (correctly identifying churners). Similarly, for the validation set, the true negative rate is high, but the true positive rate remains low. Classification Report:
The precision for the positive class (churners) is relatively low, especially for the validation set, which means there are many false positives. The recall for the positive class is very low, indicating that the model misses many actual churners. The overall accuracy is good (around 80% for both training and validation), but this is primarily due to the high accuracy in predicting the non-churners. Summary of Performance The model with SGD optimizer struggles to identify churners accurately, as evidenced by the low recall for the positive class. While the overall accuracy is high, this is misleading due to the class imbalance and the model's inability to correctly identify a significant portion of churners. The performance indicates a need for further tuning or changing the model to improve recall for the positive class, which is critical for churn prediction tasks.
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
#Initializing the neural network
model_1 = Sequential()
#Complete the code to add a input layer (specify the # of neurons and activation function)
model_1.add(Dense(64,activation='relu',input_dim = X_train.shape[1]))
#Complete the code to add a hidden layer (specify the # of neurons and activation function)
model_1.add(Dense(32,activation='relu'))
#Complete the code to add a output layer with the required number of neurons and relu as activation function
model_1.add(Dense(1, activation = 'sigmoid'))
#Complete the code to use Adam as the optimizer.
optimizer = Adam(learning_rate=0.001)
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 768
dense_1 (Dense) (None, 32) 2080
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 2881 (11.25 KB)
Trainable params: 2881 (11.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
#Fitting the ANN
history_1 = model_1.fit(
X_train,y_train,
batch_size= 32, ## Complete the code to specify the batch size to use
validation_data=(X_val,y_val),
epochs=50, ## Complete the code to specify the number of epochs
verbose=1
)
Epoch 1/50 200/200 [==============================] - 4s 10ms/step - loss: 0.4512 - recall_1: 0.1020 - val_loss: 0.4334 - val_recall_1: 0.1442 Epoch 2/50 200/200 [==============================] - 2s 10ms/step - loss: 0.4101 - recall_1: 0.2684 - val_loss: 0.4177 - val_recall_1: 0.2669 Epoch 3/50 200/200 [==============================] - 2s 8ms/step - loss: 0.3970 - recall_1: 0.3313 - val_loss: 0.4077 - val_recall_1: 0.3374 Epoch 4/50 200/200 [==============================] - 2s 9ms/step - loss: 0.3865 - recall_1: 0.3528 - val_loss: 0.4027 - val_recall_1: 0.4202 Epoch 5/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3752 - recall_1: 0.3873 - val_loss: 0.3967 - val_recall_1: 0.2822 Epoch 6/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3660 - recall_1: 0.4011 - val_loss: 0.3878 - val_recall_1: 0.4172 Epoch 7/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3592 - recall_1: 0.4340 - val_loss: 0.3799 - val_recall_1: 0.3681 Epoch 8/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3522 - recall_1: 0.4210 - val_loss: 0.3750 - val_recall_1: 0.4264 Epoch 9/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3462 - recall_1: 0.4509 - val_loss: 0.3731 - val_recall_1: 0.3436 Epoch 10/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3403 - recall_1: 0.4563 - val_loss: 0.3632 - val_recall_1: 0.4601 Epoch 11/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3349 - recall_1: 0.4732 - val_loss: 0.3714 - val_recall_1: 0.5429 Epoch 12/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3319 - recall_1: 0.4724 - val_loss: 0.3633 - val_recall_1: 0.4693 Epoch 13/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3282 - recall_1: 0.4946 - val_loss: 0.3606 - val_recall_1: 0.4141 Epoch 14/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3249 - recall_1: 0.4900 - val_loss: 0.3603 - val_recall_1: 0.3558 Epoch 15/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3223 - recall_1: 0.4908 - val_loss: 0.3608 - val_recall_1: 0.3834 Epoch 16/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3208 - recall_1: 0.5000 - val_loss: 0.3572 - val_recall_1: 0.4233 Epoch 17/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3165 - recall_1: 0.5084 - val_loss: 0.3543 - val_recall_1: 0.4448 Epoch 18/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3138 - recall_1: 0.5146 - val_loss: 0.3597 - val_recall_1: 0.3896 Epoch 19/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3128 - recall_1: 0.5184 - val_loss: 0.3707 - val_recall_1: 0.3742 Epoch 20/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3108 - recall_1: 0.5192 - val_loss: 0.3559 - val_recall_1: 0.4540 Epoch 21/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3095 - recall_1: 0.5176 - val_loss: 0.3557 - val_recall_1: 0.4816 Epoch 22/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3093 - recall_1: 0.5261 - val_loss: 0.3561 - val_recall_1: 0.4294 Epoch 23/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3065 - recall_1: 0.5123 - val_loss: 0.3581 - val_recall_1: 0.4571 Epoch 24/50 200/200 [==============================] - 1s 5ms/step - loss: 0.3049 - recall_1: 0.5291 - val_loss: 0.3574 - val_recall_1: 0.4509 Epoch 25/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3038 - recall_1: 0.5376 - val_loss: 0.3671 - val_recall_1: 0.4110 Epoch 26/50 200/200 [==============================] - 1s 4ms/step - loss: 0.3028 - recall_1: 0.5299 - val_loss: 0.3598 - val_recall_1: 0.4509 Epoch 27/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3015 - recall_1: 0.5291 - val_loss: 0.3626 - val_recall_1: 0.4325 Epoch 28/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3009 - recall_1: 0.5437 - val_loss: 0.3624 - val_recall_1: 0.4417 Epoch 29/50 200/200 [==============================] - 1s 3ms/step - loss: 0.3010 - recall_1: 0.5238 - val_loss: 0.3643 - val_recall_1: 0.5245 Epoch 30/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2964 - recall_1: 0.5406 - val_loss: 0.3604 - val_recall_1: 0.5215 Epoch 31/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2967 - recall_1: 0.5483 - val_loss: 0.3649 - val_recall_1: 0.4080 Epoch 32/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2958 - recall_1: 0.5468 - val_loss: 0.3638 - val_recall_1: 0.4509 Epoch 33/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2945 - recall_1: 0.5422 - val_loss: 0.3630 - val_recall_1: 0.4601 Epoch 34/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2950 - recall_1: 0.5422 - val_loss: 0.3635 - val_recall_1: 0.4816 Epoch 35/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2918 - recall_1: 0.5621 - val_loss: 0.3613 - val_recall_1: 0.4632 Epoch 36/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2913 - recall_1: 0.5460 - val_loss: 0.3661 - val_recall_1: 0.4724 Epoch 37/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2901 - recall_1: 0.5575 - val_loss: 0.3629 - val_recall_1: 0.4663 Epoch 38/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2880 - recall_1: 0.5475 - val_loss: 0.3660 - val_recall_1: 0.5583 Epoch 39/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2882 - recall_1: 0.5606 - val_loss: 0.3694 - val_recall_1: 0.4571 Epoch 40/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2877 - recall_1: 0.5575 - val_loss: 0.3750 - val_recall_1: 0.5245 Epoch 41/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2870 - recall_1: 0.5529 - val_loss: 0.3724 - val_recall_1: 0.3957 Epoch 42/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2846 - recall_1: 0.5613 - val_loss: 0.3651 - val_recall_1: 0.4693 Epoch 43/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2846 - recall_1: 0.5652 - val_loss: 0.3711 - val_recall_1: 0.4417 Epoch 44/50 200/200 [==============================] - 1s 4ms/step - loss: 0.2834 - recall_1: 0.5529 - val_loss: 0.3743 - val_recall_1: 0.4755 Epoch 45/50 200/200 [==============================] - 1s 4ms/step - loss: 0.2828 - recall_1: 0.5759 - val_loss: 0.3716 - val_recall_1: 0.5092 Epoch 46/50 200/200 [==============================] - 1s 4ms/step - loss: 0.2800 - recall_1: 0.5629 - val_loss: 0.3701 - val_recall_1: 0.4417 Epoch 47/50 200/200 [==============================] - 1s 5ms/step - loss: 0.2788 - recall_1: 0.5729 - val_loss: 0.3807 - val_recall_1: 0.4264 Epoch 48/50 200/200 [==============================] - 1s 4ms/step - loss: 0.2807 - recall_1: 0.5736 - val_loss: 0.3710 - val_recall_1: 0.4755 Epoch 49/50 200/200 [==============================] - 1s 5ms/step - loss: 0.2774 - recall_1: 0.5805 - val_loss: 0.3760 - val_recall_1: 0.4202 Epoch 50/50 200/200 [==============================] - 1s 3ms/step - loss: 0.2780 - recall_1: 0.5790 - val_loss: 0.3700 - val_recall_1: 0.4571
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Recall
#Plotting Train recall vs Validation recall
plt.plot(history_1.history['recall'])
plt.plot(history_1.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-66-2543e1ea2246> in <cell line: 2>() 1 #Plotting Train recall vs Validation recall ----> 2 plt.plot(history_1.history['recall']) 3 plt.plot(history_1.history['val_recall']) 4 plt.title('model recall') 5 plt.ylabel('Recall') KeyError: 'recall'
#Predicting the results using 0.5 as the threshold
y_train_pred = model_1.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[False]])
#Predicting the results using 0.5 as the threshold
y_val_pred = model_1.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 3ms/step
array([[False],
[False],
[False],
...,
[False],
[False],
[ True]])
model_name = "NN with Adam"
train_metric_df.loc[model_name] = recall_score(y_train,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)
Classification report
#lassification report
cr=classification_report(y_train,y_train_pred)
print("Classification Report for NN with Adam as optimizer on training set")
print(cr)
Classification Report for NN with SGD as optimizer on training set
precision recall f1-score support
0 0.90 0.97 0.93 5096
1 0.82 0.60 0.69 1304
accuracy 0.89 6400
macro avg 0.86 0.78 0.81 6400
weighted avg 0.89 0.89 0.88 6400
#classification report
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification Report for NN with Adam as optimizer on validation set")
print(cr)
precision recall f1-score support
0 0.87 0.95 0.91 1274
1 0.70 0.46 0.55 326
accuracy 0.85 1600
macro avg 0.78 0.70 0.73 1600
weighted avg 0.84 0.85 0.84 1600
Confusion matrix
#Calculating the confusion matrix
make_confusion_matrix(y_train, y_train_pred)
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
Observations and Key Takeaways for NN with Adam Optimizer Learning Curves Model Loss:
The training loss consistently decreases over the epochs, indicating that the model is learning. The validation loss shows a decreasing trend initially but starts to fluctuate after around 20 epochs, suggesting potential overfitting or the need for further tuning. Model Recall:
The recall for both training and validation sets shows improvement over the epochs. The validation recall is more volatile compared to the training recall, indicating variability in how well the model identifies positive cases (churners) on the validation set. Confusion Matrices Training Set:
True Negatives (TN): 4921 False Positives (FP): 175 False Negatives (FN): 525 True Positives (TP): 779 Validation Set:
True Negatives (TN): 1209 False Positives (FP): 65 False Negatives (FN): 177 True Positives (TP): 149 Classification Reports Training Set:
Precision: Class 0: 0.90 Class 1: 0.82 Recall: Class 0: 0.97 Class 1: 0.60 F1-Score: Class 0: 0.93 Class 1: 0.69 Accuracy: 0.89 Validation Set:
Precision: Class 0: 0.87 Class 1: 0.70 Recall: Class 0: 0.95 Class 1: 0.46 F1-Score: Class 0: 0.91 Class 1: 0.55 Accuracy: 0.85 Key Takeaways Improved Performance:
The model with the Adam optimizer shows improved recall for the positive class (churners) compared to the model with the SGD optimizer. Both precision and recall for the positive class are better with Adam, leading to higher F1-scores. Class Imbalance Impact:
The recall for the positive class (churners) is still lower than desired, indicating the persistent challenge of class imbalance. Precision for the positive class is relatively high, suggesting that when the model predicts churn, it is often correct. Generalization:
The model generalizes better than the previous model with SGD, as indicated by the more consistent performance metrics across the training and validation sets. However, the validation recall's volatility indicates that the model might benefit from further regularization or different dropout rates. Potential Overfitting:
The fluctuations in validation loss and recall suggest potential overfitting after a certain number of epochs. Implementing early stopping based on validation performance could help prevent overfitting.
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the neural network
model_2 = Sequential()
# Adding the input layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu', input_dim=X_train.shape[1]))
# Adding dropout with ratio of 0.2
model_2.add(Dropout(0.2))
# Adding a hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))
# Adding another hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))
# Adding dropout with ratio of 0.1
model_2.add(Dropout(0.1))
# Adding another hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))
# Adding the output layer with 1 neuron and sigmoid as activation function
model_2.add(Dense(1, activation='sigmoid'))
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
## Complete the code to compile the model with binary cross entropy as loss function and recall as the metric.
model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
# Summary of the model
model_2.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dropout (Dropout) (None, 32) 0
dense_1 (Dense) (None, 32) 1056
dense_2 (Dense) (None, 32) 1056
dropout_1 (Dropout) (None, 32) 0
dense_3 (Dense) (None, 32) 1056
dense_4 (Dense) (None, 1) 33
=================================================================
Total params: 3585 (14.00 KB)
Trainable params: 3585 (14.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN with batch_size = 32 and 100 epochs
history_2 = model_2.fit(
X_train, y_train,
batch_size=32, ## Complete the code to specify the batch size.
epochs=100, ## Complete the code to specify the # of epochs.
verbose=1,
validation_data=(X_val, y_val)
)
Epoch 1/100 200/200 [==============================] - 2s 5ms/step - loss: 0.4842 - recall: 0.0284 - val_loss: 0.4426 - val_recall: 0.0000e+00 Epoch 2/100 200/200 [==============================] - 1s 4ms/step - loss: 0.4413 - recall: 0.0544 - val_loss: 0.4300 - val_recall: 0.0399 Epoch 3/100 200/200 [==============================] - 1s 4ms/step - loss: 0.4323 - recall: 0.1342 - val_loss: 0.4226 - val_recall: 0.2055 Epoch 4/100 200/200 [==============================] - 1s 4ms/step - loss: 0.4236 - recall: 0.2109 - val_loss: 0.4202 - val_recall: 0.3712 Epoch 5/100 200/200 [==============================] - 1s 4ms/step - loss: 0.4195 - recall: 0.2791 - val_loss: 0.4132 - val_recall: 0.2086 Epoch 6/100 200/200 [==============================] - 1s 5ms/step - loss: 0.4129 - recall: 0.2791 - val_loss: 0.4052 - val_recall: 0.3129 Epoch 7/100 200/200 [==============================] - 1s 6ms/step - loss: 0.4065 - recall: 0.3044 - val_loss: 0.3999 - val_recall: 0.3006 Epoch 8/100 200/200 [==============================] - 1s 6ms/step - loss: 0.4005 - recall: 0.3052 - val_loss: 0.4024 - val_recall: 0.3190 Epoch 9/100 200/200 [==============================] - 1s 6ms/step - loss: 0.3990 - recall: 0.2968 - val_loss: 0.4014 - val_recall: 0.2301 Epoch 10/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3971 - recall: 0.2983 - val_loss: 0.3965 - val_recall: 0.3067 Epoch 11/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3948 - recall: 0.3213 - val_loss: 0.3971 - val_recall: 0.4018 Epoch 12/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3911 - recall: 0.3566 - val_loss: 0.3940 - val_recall: 0.3006 Epoch 13/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3896 - recall: 0.3512 - val_loss: 0.3895 - val_recall: 0.3344 Epoch 14/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3895 - recall: 0.3413 - val_loss: 0.3886 - val_recall: 0.2883 Epoch 15/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3802 - recall: 0.3666 - val_loss: 0.3826 - val_recall: 0.3834 Epoch 16/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3787 - recall: 0.3781 - val_loss: 0.3836 - val_recall: 0.3497 Epoch 17/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3773 - recall: 0.3850 - val_loss: 0.3773 - val_recall: 0.3589 Epoch 18/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3750 - recall: 0.3919 - val_loss: 0.3752 - val_recall: 0.3282 Epoch 19/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3653 - recall: 0.4072 - val_loss: 0.3709 - val_recall: 0.3221 Epoch 20/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3667 - recall: 0.4018 - val_loss: 0.3656 - val_recall: 0.4080 Epoch 21/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3603 - recall: 0.4195 - val_loss: 0.3646 - val_recall: 0.3834 Epoch 22/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3593 - recall: 0.4187 - val_loss: 0.3579 - val_recall: 0.4172 Epoch 23/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3586 - recall: 0.4164 - val_loss: 0.3592 - val_recall: 0.4049 Epoch 24/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3550 - recall: 0.4310 - val_loss: 0.3558 - val_recall: 0.4325 Epoch 25/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3474 - recall: 0.4479 - val_loss: 0.3569 - val_recall: 0.4018 Epoch 26/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3496 - recall: 0.4317 - val_loss: 0.3529 - val_recall: 0.4387 Epoch 27/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3496 - recall: 0.4325 - val_loss: 0.3524 - val_recall: 0.4080 Epoch 28/100 200/200 [==============================] - 1s 6ms/step - loss: 0.3392 - recall: 0.4678 - val_loss: 0.3500 - val_recall: 0.4356 Epoch 29/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3459 - recall: 0.4310 - val_loss: 0.3559 - val_recall: 0.5092 Epoch 30/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3388 - recall: 0.4647 - val_loss: 0.3507 - val_recall: 0.4816 Epoch 31/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3437 - recall: 0.4502 - val_loss: 0.3496 - val_recall: 0.3926 Epoch 32/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3437 - recall: 0.4371 - val_loss: 0.3497 - val_recall: 0.4294 Epoch 33/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3375 - recall: 0.4555 - val_loss: 0.3498 - val_recall: 0.4816 Epoch 34/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3339 - recall: 0.4532 - val_loss: 0.3504 - val_recall: 0.4785 Epoch 35/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3339 - recall: 0.4632 - val_loss: 0.3565 - val_recall: 0.5000 Epoch 36/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3371 - recall: 0.4586 - val_loss: 0.3498 - val_recall: 0.4479 Epoch 37/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3324 - recall: 0.4839 - val_loss: 0.3472 - val_recall: 0.4479 Epoch 38/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3329 - recall: 0.4571 - val_loss: 0.3520 - val_recall: 0.4908 Epoch 39/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3326 - recall: 0.4693 - val_loss: 0.3535 - val_recall: 0.4908 Epoch 40/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3344 - recall: 0.4647 - val_loss: 0.3545 - val_recall: 0.4693 Epoch 41/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3344 - recall: 0.4509 - val_loss: 0.3488 - val_recall: 0.4663 Epoch 42/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3328 - recall: 0.4563 - val_loss: 0.3494 - val_recall: 0.4509 Epoch 43/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3312 - recall: 0.4548 - val_loss: 0.3490 - val_recall: 0.4387 Epoch 44/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3260 - recall: 0.4716 - val_loss: 0.3534 - val_recall: 0.4908 Epoch 45/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3262 - recall: 0.4663 - val_loss: 0.3518 - val_recall: 0.4601 Epoch 46/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3272 - recall: 0.4701 - val_loss: 0.3543 - val_recall: 0.4571 Epoch 47/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3245 - recall: 0.4770 - val_loss: 0.3588 - val_recall: 0.4172 Epoch 48/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3312 - recall: 0.4540 - val_loss: 0.3545 - val_recall: 0.4387 Epoch 49/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3264 - recall: 0.4686 - val_loss: 0.3602 - val_recall: 0.4724 Epoch 50/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3253 - recall: 0.4770 - val_loss: 0.3532 - val_recall: 0.4693 Epoch 51/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3283 - recall: 0.4663 - val_loss: 0.3545 - val_recall: 0.4724 Epoch 52/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3213 - recall: 0.4709 - val_loss: 0.3522 - val_recall: 0.4632 Epoch 53/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3222 - recall: 0.4847 - val_loss: 0.3556 - val_recall: 0.4785 Epoch 54/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3264 - recall: 0.4801 - val_loss: 0.3534 - val_recall: 0.4540 Epoch 55/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3251 - recall: 0.4877 - val_loss: 0.3539 - val_recall: 0.4049 Epoch 56/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3205 - recall: 0.4824 - val_loss: 0.3549 - val_recall: 0.4233 Epoch 57/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3231 - recall: 0.4655 - val_loss: 0.3522 - val_recall: 0.4356 Epoch 58/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3215 - recall: 0.4793 - val_loss: 0.3532 - val_recall: 0.4448 Epoch 59/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3209 - recall: 0.4747 - val_loss: 0.3580 - val_recall: 0.5215 Epoch 60/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3198 - recall: 0.4946 - val_loss: 0.3538 - val_recall: 0.4110 Epoch 61/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3228 - recall: 0.4762 - val_loss: 0.3573 - val_recall: 0.4755 Epoch 62/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3194 - recall: 0.4632 - val_loss: 0.3573 - val_recall: 0.4632 Epoch 63/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3225 - recall: 0.4778 - val_loss: 0.3570 - val_recall: 0.4509 Epoch 64/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3233 - recall: 0.4770 - val_loss: 0.3550 - val_recall: 0.4479 Epoch 65/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3198 - recall: 0.4778 - val_loss: 0.3559 - val_recall: 0.4509 Epoch 66/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3126 - recall: 0.4900 - val_loss: 0.3560 - val_recall: 0.4540 Epoch 67/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3211 - recall: 0.4678 - val_loss: 0.3595 - val_recall: 0.5123 Epoch 68/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3226 - recall: 0.4770 - val_loss: 0.3599 - val_recall: 0.5031 Epoch 69/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3186 - recall: 0.4839 - val_loss: 0.3572 - val_recall: 0.4847 Epoch 70/100 200/200 [==============================] - 1s 4ms/step - loss: 0.3196 - recall: 0.4854 - val_loss: 0.3576 - val_recall: 0.4785 Epoch 71/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3167 - recall: 0.4686 - val_loss: 0.3553 - val_recall: 0.5092 Epoch 72/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3132 - recall: 0.5000 - val_loss: 0.3560 - val_recall: 0.4816 Epoch 73/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4931 - val_loss: 0.3538 - val_recall: 0.4847 Epoch 74/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3161 - recall: 0.4808 - val_loss: 0.3585 - val_recall: 0.4724 Epoch 75/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3173 - recall: 0.4831 - val_loss: 0.3547 - val_recall: 0.3988 Epoch 76/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4747 - val_loss: 0.3507 - val_recall: 0.4663 Epoch 77/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3173 - recall: 0.4870 - val_loss: 0.3550 - val_recall: 0.4693 Epoch 78/100 200/200 [==============================] - 0s 2ms/step - loss: 0.3141 - recall: 0.4923 - val_loss: 0.3541 - val_recall: 0.4479 Epoch 79/100 200/200 [==============================] - 0s 2ms/step - loss: 0.3121 - recall: 0.4969 - val_loss: 0.3579 - val_recall: 0.4755 Epoch 80/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4839 - val_loss: 0.3593 - val_recall: 0.4877 Epoch 81/100 200/200 [==============================] - 1s 2ms/step - loss: 0.3148 - recall: 0.4954 - val_loss: 0.3532 - val_recall: 0.4663 Epoch 82/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3116 - recall: 0.4916 - val_loss: 0.3656 - val_recall: 0.5215 Epoch 83/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3151 - recall: 0.4923 - val_loss: 0.3583 - val_recall: 0.4601 Epoch 84/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3133 - recall: 0.4801 - val_loss: 0.3674 - val_recall: 0.5092 Epoch 85/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.4931 - val_loss: 0.3559 - val_recall: 0.4785 Epoch 86/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3152 - recall: 0.4908 - val_loss: 0.3624 - val_recall: 0.5245 Epoch 87/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.4939 - val_loss: 0.3606 - val_recall: 0.4939 Epoch 88/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3152 - recall: 0.4801 - val_loss: 0.3655 - val_recall: 0.4877 Epoch 89/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3075 - recall: 0.5169 - val_loss: 0.3579 - val_recall: 0.4356 Epoch 90/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3156 - recall: 0.4847 - val_loss: 0.3564 - val_recall: 0.4847 Epoch 91/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3091 - recall: 0.5031 - val_loss: 0.3614 - val_recall: 0.4632 Epoch 92/100 200/200 [==============================] - 1s 5ms/step - loss: 0.3146 - recall: 0.4962 - val_loss: 0.3560 - val_recall: 0.4724 Epoch 93/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3117 - recall: 0.4801 - val_loss: 0.3563 - val_recall: 0.4755 Epoch 94/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3127 - recall: 0.4801 - val_loss: 0.3654 - val_recall: 0.5092 Epoch 95/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.5000 - val_loss: 0.3621 - val_recall: 0.4663 Epoch 96/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3105 - recall: 0.5038 - val_loss: 0.3596 - val_recall: 0.4693 Epoch 97/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3122 - recall: 0.4885 - val_loss: 0.3588 - val_recall: 0.4755 Epoch 98/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3095 - recall: 0.4946 - val_loss: 0.3587 - val_recall: 0.4877 Epoch 99/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3096 - recall: 0.5008 - val_loss: 0.3599 - val_recall: 0.4693 Epoch 100/100 200/200 [==============================] - 1s 3ms/step - loss: 0.3103 - recall: 0.5061 - val_loss: 0.3596 - val_recall: 0.4816
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_2.history['loss'])
plt.plot(history_2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
From the above plot, we can observe that the train and validation curves are having smooth lines. Reducing the number of neurons and adding dropouts to the model worked, and the problem of overfitting was solved.
#Plotting Train recall vs Validation recall
plt.plot(history_2.history['recall'])
plt.plot(history_2.history['val_recall'])
plt.title('model recall')
plt.ylabel('recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Predicting the results using best as a threshold
y_train_pred = model_2.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 2ms/step
array([[False],
[False],
[False],
...,
[False],
[False],
[False]])
#Predicting the results using 0.5 as the threshold.
y_val_pred = model_2.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[False],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with Adam & Dropout"
train_metric_df.loc[model_name] = recall_score(y_train,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)
Classification report
#classification report
cr=classification_report(y_train,y_train_pred)
print("Classificaton Report of NN with Adam and dropout on training set")
print(cr)
Classificaton Report of NN with Adam and dropout on training set
precision recall f1-score support
0 0.90 0.97 0.93 5096
1 0.84 0.56 0.67 1304
accuracy 0.89 6400
macro avg 0.87 0.76 0.80 6400
weighted avg 0.88 0.89 0.88 6400
#classification report
cr = classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classificaton Report of NN with Adam and dropout on Validation set")
print(cr)
Classificaton Report of NN with Adam and dropout on Validation set
precision recall f1-score support
0 0.88 0.96 0.92 1274
1 0.73 0.48 0.58 326
accuracy 0.86 1600
macro avg 0.81 0.72 0.75 1600
weighted avg 0.85 0.86 0.85 1600
Confusion matrix
#Calculating the confusion matrix
make_confusion_matrix(y_train, y_train_pred)
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
Key Observations and Takeaways for NN with Adam Optimizer and Dropout Model Loss Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss decreases initially but then fluctuates, indicating some level of overfitting. However, the fluctuations are not very large, suggesting that the dropout layers are helping to control overfitting to some extent. Model Recall Training Recall: The training recall improves steadily, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves but shows more fluctuations compared to the training recall, suggesting some variability in performance on the validation data. Confusion Matrix (Training Data) True Negatives (TN): 4959 (77.48%) False Positives (FP): 137 (2.14%) False Negatives (FN): 580 (9.06%) True Positives (TP): 724 (11.31%) Confusion Matrix (Validation Data) True Negatives (TN): 1217 (76.06%) False Positives (FP): 57 (3.56%) False Negatives (FN): 169 (10.56%) True Positives (TP): 157 (9.81%) Classification Report (Training Data) Precision: 0.90 (Class 0), 0.84 (Class 1) Recall: 0.97 (Class 0), 0.56 (Class 1) F1-Score: 0.93 (Class 0), 0.67 (Class 1) Accuracy: 0.89 Classification Report (Validation Data) Precision: 0.88 (Class 0), 0.73 (Class 1) Recall: 0.96 (Class 0), 0.48 (Class 1) F1-Score: 0.92 (Class 0), 0.58 (Class 1) Accuracy: 0.86 Key Takeaways Improvement Over Previous Model: The introduction of dropout layers has improved the model’s ability to generalize, as evidenced by the reduction in overfitting and improved validation metrics compared to the previous model without dropout. Recall: While the recall for the positive class (churn) is higher than in the previous models, it is still relatively low, indicating that the model is missing a significant number of positive cases. Precision: The precision for both classes is high, indicating that when the model predicts a class, it is likely to be correct. This is particularly important for the business context where false positives (predicting churn when the customer will not churn) could lead to unnecessary retention efforts. Overall Performance: The model shows good overall performance with an accuracy of 0.86 on the validation set. However, there is still room for improvement, particularly in increasing the recall for the positive class to ensure more churn cases are correctly identified. By incorporating dropout, the model has become more robust to overfitting and has shown a more stable performance across epochs. However, further improvements are needed to increase the recall for the positive class, which is critical in a churn prediction context.
Let's try to apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.
sm = SMOTE(random_state=42)
#Complete the code to fit SMOTE on the training data.
X_train_smote, y_train_smote= sm.fit_resample(X_train, y_train)
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)
After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)
Let's build a model with the balanced dataset
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the model
model_3 = Sequential()
# Add the input layer with 32 neurons and relu activation function
model_3.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Add a hidden layer with 16 neurons and relu activation function
model_3.add(Dense(16, activation='relu'))
# Add another hidden layer with 16 neurons and relu activation function
model_3.add(Dense(16, activation='relu'))
# Add the output layer with 1 neuron and sigmoid activation function
model_3.add(Dense(1, activation='sigmoid'))
#Complete the code to use SGD as the optimizer.
optimizer = tf.keras.optimizers.SGD(0.001)
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_3.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dense_1 (Dense) (None, 16) 528
dense_2 (Dense) (None, 16) 272
dense_3 (Dense) (None, 1) 17
=================================================================
Total params: 1201 (4.69 KB)
Trainable params: 1201 (4.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN
history_3 = model_3.fit(
X_train_smote, y_train_smote,
batch_size=32, ## Specify the batch size to use
epochs=50, ## Specify the number of epochs
verbose=1,
validation_data=(X_val, y_val)
)
Epoch 1/50 319/319 [==============================] - 2s 3ms/step - loss: 0.6885 - recall: 0.4309 - val_loss: 0.6703 - val_recall: 0.3466 Epoch 2/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6820 - recall: 0.3819 - val_loss: 0.6568 - val_recall: 0.3190 Epoch 3/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6772 - recall: 0.3966 - val_loss: 0.6478 - val_recall: 0.3006 Epoch 4/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6732 - recall: 0.4237 - val_loss: 0.6419 - val_recall: 0.3436 Epoch 5/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6695 - recall: 0.4631 - val_loss: 0.6377 - val_recall: 0.3957 Epoch 6/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6660 - recall: 0.5124 - val_loss: 0.6344 - val_recall: 0.4294 Epoch 7/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6625 - recall: 0.5410 - val_loss: 0.6322 - val_recall: 0.4479 Epoch 8/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6589 - recall: 0.5810 - val_loss: 0.6299 - val_recall: 0.4847 Epoch 9/50 319/319 [==============================] - 1s 4ms/step - loss: 0.6553 - recall: 0.5987 - val_loss: 0.6280 - val_recall: 0.5184 Epoch 10/50 319/319 [==============================] - 2s 5ms/step - loss: 0.6517 - recall: 0.6207 - val_loss: 0.6260 - val_recall: 0.5460 Epoch 11/50 319/319 [==============================] - 1s 4ms/step - loss: 0.6480 - recall: 0.6366 - val_loss: 0.6242 - val_recall: 0.5613 Epoch 12/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6443 - recall: 0.6472 - val_loss: 0.6218 - val_recall: 0.5736 Epoch 13/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6406 - recall: 0.6515 - val_loss: 0.6209 - val_recall: 0.5828 Epoch 14/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6370 - recall: 0.6674 - val_loss: 0.6183 - val_recall: 0.5859 Epoch 15/50 319/319 [==============================] - 1s 2ms/step - loss: 0.6333 - recall: 0.6692 - val_loss: 0.6174 - val_recall: 0.5982 Epoch 16/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6296 - recall: 0.6811 - val_loss: 0.6156 - val_recall: 0.6074 Epoch 17/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6260 - recall: 0.6872 - val_loss: 0.6130 - val_recall: 0.6196 Epoch 18/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6224 - recall: 0.6913 - val_loss: 0.6103 - val_recall: 0.6166 Epoch 19/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6188 - recall: 0.6949 - val_loss: 0.6075 - val_recall: 0.6166 Epoch 20/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6154 - recall: 0.6927 - val_loss: 0.6071 - val_recall: 0.6288 Epoch 21/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6120 - recall: 0.6974 - val_loss: 0.6044 - val_recall: 0.6319 Epoch 22/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6086 - recall: 0.7041 - val_loss: 0.6001 - val_recall: 0.6258 Epoch 23/50 319/319 [==============================] - 1s 3ms/step - loss: 0.6054 - recall: 0.6972 - val_loss: 0.5994 - val_recall: 0.6288 Epoch 24/50 319/319 [==============================] - 1s 4ms/step - loss: 0.6023 - recall: 0.7057 - val_loss: 0.5972 - val_recall: 0.6288 Epoch 25/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5992 - recall: 0.7057 - val_loss: 0.5965 - val_recall: 0.6380 Epoch 26/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5963 - recall: 0.7098 - val_loss: 0.5944 - val_recall: 0.6442 Epoch 27/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5935 - recall: 0.7090 - val_loss: 0.5934 - val_recall: 0.6472 Epoch 28/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5909 - recall: 0.7135 - val_loss: 0.5904 - val_recall: 0.6472 Epoch 29/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5883 - recall: 0.7084 - val_loss: 0.5920 - val_recall: 0.6534 Epoch 30/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5860 - recall: 0.7129 - val_loss: 0.5880 - val_recall: 0.6472 Epoch 31/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5837 - recall: 0.7106 - val_loss: 0.5876 - val_recall: 0.6503 Epoch 32/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5817 - recall: 0.7143 - val_loss: 0.5846 - val_recall: 0.6472 Epoch 33/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5797 - recall: 0.7153 - val_loss: 0.5829 - val_recall: 0.6472 Epoch 34/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5779 - recall: 0.7162 - val_loss: 0.5816 - val_recall: 0.6534 Epoch 35/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5762 - recall: 0.7194 - val_loss: 0.5776 - val_recall: 0.6534 Epoch 36/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5747 - recall: 0.7162 - val_loss: 0.5788 - val_recall: 0.6656 Epoch 37/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5733 - recall: 0.7182 - val_loss: 0.5790 - val_recall: 0.6718 Epoch 38/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5719 - recall: 0.7227 - val_loss: 0.5762 - val_recall: 0.6687 Epoch 39/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5707 - recall: 0.7200 - val_loss: 0.5782 - val_recall: 0.6718 Epoch 40/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5695 - recall: 0.7243 - val_loss: 0.5765 - val_recall: 0.6748 Epoch 41/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5684 - recall: 0.7296 - val_loss: 0.5726 - val_recall: 0.6626 Epoch 42/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5673 - recall: 0.7308 - val_loss: 0.5697 - val_recall: 0.6626 Epoch 43/50 319/319 [==============================] - 2s 5ms/step - loss: 0.5663 - recall: 0.7259 - val_loss: 0.5717 - val_recall: 0.6687 Epoch 44/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5654 - recall: 0.7302 - val_loss: 0.5696 - val_recall: 0.6687 Epoch 45/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5645 - recall: 0.7316 - val_loss: 0.5687 - val_recall: 0.6718 Epoch 46/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5636 - recall: 0.7333 - val_loss: 0.5674 - val_recall: 0.6718 Epoch 47/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5627 - recall: 0.7278 - val_loss: 0.5702 - val_recall: 0.6810 Epoch 48/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5619 - recall: 0.7333 - val_loss: 0.5678 - val_recall: 0.6779 Epoch 49/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5611 - recall: 0.7323 - val_loss: 0.5683 - val_recall: 0.6779 Epoch 50/50 319/319 [==============================] - 1s 2ms/step - loss: 0.5604 - recall: 0.7329 - val_loss: 0.5677 - val_recall: 0.6779
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Plotting Train recall vs Validation recall
plt.plot(history_3.history['recall'])
plt.plot(history_3.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
y_train_pred = model_3.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 1ms/step
array([[ True],
[False],
[False],
...,
[ True],
[ True],
[ True]])
y_val_pred = model_3.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
array([[ True],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with SMOTE & SGD"
train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)
Classification report
cr=classification_report(y_train_smote,y_train_pred)
print("Classification report of NN with SMOTE & SGD on training set")
print(cr)
Classification report of NN with SMOTE & SGD on training set
precision recall f1-score support
0 0.73 0.71 0.72 5096
1 0.72 0.73 0.73 5096
accuracy 0.72 10192
macro avg 0.72 0.72 0.72 10192
weighted avg 0.72 0.72 0.72 10192
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification report of NN with SMOTE & SGD on validation set")
print(cr)
Classification report of NN with SMOTE & SGD on validation set
precision recall f1-score support
0 0.90 0.72 0.80 1274
1 0.38 0.68 0.49 326
accuracy 0.71 1600
macro avg 0.64 0.70 0.64 1600
weighted avg 0.79 0.71 0.74 1600
Confusion matrix
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
Key Observations and Takeaways for NN with SMOTE & SGD
Model Loss Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss closely follows the training loss, decreasing over time. This indicates that the model is generalizing well to unseen data without significant overfitting. Model Recall Training Recall: The training recall shows a significant improvement during the initial epochs and then stabilizes around 73%, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves significantly and stabilizes around 68%. The validation recall follows a similar trend to the training recall, suggesting consistent performance. Confusion Matrix Training Data:
True Negatives (TN): 3635 (35.67%) False Positives (FP): 1461 (14.33%) False Negatives (FN): 1352 (13.27%) True Positives (TP): 3744 (36.73%) Validation Data:
True Negatives (TN): 917 (57.31%) False Positives (FP): 357 (22.31%) False Negatives (FN): 105 (6.56%) True Positives (TP): 221 (13.81%) Classification Report Training Data:
Precision: 0.73 (Class 0), 0.72 (Class 1) Recall: 0.71 (Class 0), 0.73 (Class 1) F1-Score: 0.72 (Class 0), 0.73 (Class 1) Accuracy: 0.72 Validation Data:
Precision: 0.90 (Class 0), 0.38 (Class 1) Recall: 0.72 (Class 0), 0.68 (Class 1) F1-Score: 0.80 (Class 0), 0.49 (Class 1) Accuracy: 0.71 Key Takeaways Balanced Performance on Training Data:
The application of SMOTE has balanced the class distribution, leading to a more balanced performance on the training set. Both precision and recall are approximately 72-73% for both classes. Improvement in Recall for Positive Class:
The recall for the positive class (churn) has significantly improved compared to previous models without SMOTE, indicating that the model is now better at identifying churn cases in the training data. Validation Performance:
The validation performance shows a significant drop in precision for the positive class (churn) but maintains a reasonably high recall. This suggests that while the model is good at identifying churn cases, it also has a higher rate of false positives in the validation data. Generalization:
The model generalizes well to unseen data, as indicated by the close alignment of training and validation loss curves. However, the precision for the churn class on the validation set indicates room for improvement in reducing false positives. Overall Performance:
The overall accuracy of 71% on the validation set is decent, with a balanced recall. The model has improved recall for the positive class, which is critical in a churn prediction context to ensure more churn cases are correctly identified. By applying SMOTE and using the SGD optimizer, the model has achieved a more balanced recall for both classes and demonstrates good generalization capability. However, there is still room for improvement in precision for the churn class to reduce false positives and improve the model's overall reliability
Let's build a model with the balanced dataset
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the model
model_4 = Sequential()
# Complete the code to add an input layer (specify the # of neurons and activation function)
model_4.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Complete the code to add a hidden layer (specify the # of neurons and the activation function)
model_4.add(Dense(16, activation='relu'))
# Complete the code to add another hidden layer (specify the # of neurons and the activation function)
model_4.add(Dense(16, activation='relu'))
# Complete the code to add the required number of neurons in the output layer and a suitable activation function
model_4.add(Dense(1, activation='sigmoid'))
model_4.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dense_1 (Dense) (None, 16) 528
dense_2 (Dense) (None, 16) 272
dense_3 (Dense) (None, 1) 17
=================================================================
Total params: 1201 (4.69 KB)
Trainable params: 1201 (4.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam()
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_4.summary()
# Fitting the ANN
history_4 = model_4.fit(
X_train_smote, y_train_smote,
batch_size=32, ## Batch size to use
epochs=50, ## Number of epochs
verbose=1,
validation_data=(X_val, y_val)
)
Epoch 1/50 319/319 [==============================] - 3s 5ms/step - loss: 0.5890 - recall: 0.6786 - val_loss: 0.5668 - val_recall: 0.6994 Epoch 2/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5355 - recall: 0.7433 - val_loss: 0.5250 - val_recall: 0.6840 Epoch 3/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4993 - recall: 0.7584 - val_loss: 0.5235 - val_recall: 0.7178 Epoch 4/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4711 - recall: 0.7734 - val_loss: 0.4718 - val_recall: 0.6718 Epoch 5/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4511 - recall: 0.7804 - val_loss: 0.5116 - val_recall: 0.7485 Epoch 6/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4378 - recall: 0.7891 - val_loss: 0.4825 - val_recall: 0.7393 Epoch 7/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4268 - recall: 0.7957 - val_loss: 0.4457 - val_recall: 0.6963 Epoch 8/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4162 - recall: 0.7977 - val_loss: 0.4355 - val_recall: 0.6871 Epoch 9/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4090 - recall: 0.8030 - val_loss: 0.4457 - val_recall: 0.6963 Epoch 10/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4045 - recall: 0.8106 - val_loss: 0.4361 - val_recall: 0.6840 Epoch 11/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4003 - recall: 0.8075 - val_loss: 0.4829 - val_recall: 0.7546 Epoch 12/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3967 - recall: 0.8134 - val_loss: 0.4163 - val_recall: 0.5951 Epoch 13/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3941 - recall: 0.8114 - val_loss: 0.4372 - val_recall: 0.6933 Epoch 14/50 319/319 [==============================] - 1s 5ms/step - loss: 0.3915 - recall: 0.8173 - val_loss: 0.4883 - val_recall: 0.7270 Epoch 15/50 319/319 [==============================] - 2s 5ms/step - loss: 0.3872 - recall: 0.8191 - val_loss: 0.4773 - val_recall: 0.7331 Epoch 16/50 319/319 [==============================] - 2s 5ms/step - loss: 0.3849 - recall: 0.8228 - val_loss: 0.4503 - val_recall: 0.7025 Epoch 17/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3848 - recall: 0.8255 - val_loss: 0.4790 - val_recall: 0.7362 Epoch 18/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3821 - recall: 0.8279 - val_loss: 0.4313 - val_recall: 0.6411 Epoch 19/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3822 - recall: 0.8214 - val_loss: 0.4268 - val_recall: 0.6534 Epoch 20/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3797 - recall: 0.8234 - val_loss: 0.4568 - val_recall: 0.7117 Epoch 21/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3769 - recall: 0.8289 - val_loss: 0.4174 - val_recall: 0.6104 Epoch 22/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3767 - recall: 0.8291 - val_loss: 0.4212 - val_recall: 0.6166 Epoch 23/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3730 - recall: 0.8301 - val_loss: 0.4373 - val_recall: 0.6871 Epoch 24/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3733 - recall: 0.8332 - val_loss: 0.4384 - val_recall: 0.6779 Epoch 25/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3709 - recall: 0.8367 - val_loss: 0.4477 - val_recall: 0.6503 Epoch 26/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3679 - recall: 0.8363 - val_loss: 0.4642 - val_recall: 0.6994 Epoch 27/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3675 - recall: 0.8371 - val_loss: 0.4325 - val_recall: 0.6656 Epoch 28/50 319/319 [==============================] - 1s 4ms/step - loss: 0.3659 - recall: 0.8389 - val_loss: 0.4414 - val_recall: 0.6656 Epoch 29/50 319/319 [==============================] - 1s 5ms/step - loss: 0.3629 - recall: 0.8432 - val_loss: 0.4497 - val_recall: 0.6810 Epoch 30/50 319/319 [==============================] - 1s 5ms/step - loss: 0.3616 - recall: 0.8444 - val_loss: 0.5024 - val_recall: 0.7515 Epoch 31/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3614 - recall: 0.8420 - val_loss: 0.4517 - val_recall: 0.6902 Epoch 32/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3568 - recall: 0.8487 - val_loss: 0.4238 - val_recall: 0.6380 Epoch 33/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3576 - recall: 0.8438 - val_loss: 0.4938 - val_recall: 0.7546 Epoch 34/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3550 - recall: 0.8548 - val_loss: 0.4234 - val_recall: 0.6288 Epoch 35/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3543 - recall: 0.8511 - val_loss: 0.4444 - val_recall: 0.6779 Epoch 36/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3523 - recall: 0.8536 - val_loss: 0.4170 - val_recall: 0.5951 Epoch 37/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3508 - recall: 0.8518 - val_loss: 0.4995 - val_recall: 0.7485 Epoch 38/50 319/319 [==============================] - 1s 2ms/step - loss: 0.3497 - recall: 0.8546 - val_loss: 0.4564 - val_recall: 0.6595 Epoch 39/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3481 - recall: 0.8552 - val_loss: 0.4399 - val_recall: 0.6534 Epoch 40/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3468 - recall: 0.8595 - val_loss: 0.4688 - val_recall: 0.7055 Epoch 41/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3461 - recall: 0.8571 - val_loss: 0.4709 - val_recall: 0.6840 Epoch 42/50 319/319 [==============================] - 1s 4ms/step - loss: 0.3442 - recall: 0.8624 - val_loss: 0.4269 - val_recall: 0.6258 Epoch 43/50 319/319 [==============================] - 1s 4ms/step - loss: 0.3443 - recall: 0.8583 - val_loss: 0.4380 - val_recall: 0.6626 Epoch 44/50 319/319 [==============================] - 2s 5ms/step - loss: 0.3432 - recall: 0.8617 - val_loss: 0.4576 - val_recall: 0.6871 Epoch 45/50 319/319 [==============================] - 1s 4ms/step - loss: 0.3404 - recall: 0.8599 - val_loss: 0.4368 - val_recall: 0.6319 Epoch 46/50 319/319 [==============================] - 1s 2ms/step - loss: 0.3404 - recall: 0.8620 - val_loss: 0.4375 - val_recall: 0.6350 Epoch 47/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3383 - recall: 0.8575 - val_loss: 0.4633 - val_recall: 0.6718 Epoch 48/50 319/319 [==============================] - 1s 3ms/step - loss: 0.3377 - recall: 0.8634 - val_loss: 0.4754 - val_recall: 0.7117 Epoch 49/50 319/319 [==============================] - 1s 2ms/step - loss: 0.3379 - recall: 0.8591 - val_loss: 0.4660 - val_recall: 0.6840 Epoch 50/50 319/319 [==============================] - 1s 2ms/step - loss: 0.3361 - recall: 0.8660 - val_loss: 0.4354 - val_recall: 0.6135
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Plotting Train recall vs Validation recall
plt.plot(history_4.history['recall'])
plt.plot(history_4.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
y_train_pred = model_4.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
array([[ True],
[False],
[False],
...,
[ True],
[False],
[ True]])
y_val_pred = model_4.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
array([[False],
[False],
[False],
...,
[False],
[False],
[ True]])
model_name = "NN with SMOTE & Adam"
train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)
Classification report
cr=classification_report(y_train_smote,y_train_pred)
print("Classification report on NN with SMOTE & Adam on training set ")
print(cr)
Classification report on NN with SMOTE & Adam on training set
precision recall f1-score support
0 0.85 0.88 0.86 5096
1 0.87 0.85 0.86 5096
accuracy 0.86 10192
macro avg 0.86 0.86 0.86 10192
weighted avg 0.86 0.86 0.86 10192
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification report on NN with SMOTE & Adam on validation set ")
print(cr)
Classification report on NN with SMOTE & Adam on validation set
precision recall f1-score support
0 0.90 0.85 0.87 1274
1 0.51 0.61 0.56 326
accuracy 0.80 1600
macro avg 0.71 0.73 0.72 1600
weighted avg 0.82 0.80 0.81 1600
Confusion matrix
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
Key Observations and Takeaways for NN with SMOTE & Adam Optimizer
Model Loss
Training Loss: The training loss decreases steadily over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss decreases initially but shows fluctuations. This suggests some level of overfitting, though the overall trend indicates improvement. Model Recall
Training Recall: The training recall improves steadily, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves but fluctuates more compared to the training recall, suggesting some variability in performance on the validation data. Confusion Matrix (Training Data)
True Negatives (TN): 4470 (43.86%) False Positives (FP): 626 (6.14%) False Negatives (FN): 773 (7.58%) True Positives (TP): 4323 (42.42%) Confusion Matrix (Validation Data)
True Negatives (TN): 1085 (67.81%) False Positives (FP): 189 (11.81%) False Negatives (FN): 126 (7.88%) True Positives (TP): 200 (12.50%) Classification Report (Training Data)
Precision: 0.85 (Class 0), 0.87 (Class 1) Recall: 0.88 (Class 0), 0.85 (Class 1) F1-Score: 0.86 (Class 0), 0.86 (Class 1) Accuracy: 0.86 Classification Report (Validation Data)
Precision: 0.90 (Class 0), 0.51 (Class 1) Recall: 0.85 (Class 0), 0.61 (Class 1) F1-Score: 0.87 (Class 0), 0.56 (Class 1) Accuracy: 0.80 Key Takeaways
Balanced Data Impact: Applying SMOTE has helped balance the classes, leading to improved recall for the positive class (churn) on the training data. This ensures the model is better at identifying churn cases. Overfitting Concerns: The fluctuations in validation loss and recall suggest some overfitting. Despite using Adam optimizer, the model struggles to generalize perfectly to the validation data. Precision and Recall: The precision for the positive class (churn) is moderate, indicating that while the model can identify churn cases, it is not as confident as for the non-churn cases. The recall for the positive class on the validation set is an improvement over previous models but still shows room for growth. Overall Performance: The model shows good overall performance with an accuracy of 0.80 on the validation set. The use of SMOTE and the Adam optimizer has improved the model's ability to detect positive cases compared to previous attempts. In summary, while the model with SMOTE and Adam optimizer shows improved recall for the positive class, there is still variability and room for improvement in the model's performance on the validation set. Further tuning and perhaps additional regularization methods could help enhance stability and performance
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
# Initializing the model
model_5 = Sequential()
# Adding the input layer with 32 neurons and relu as activation function
model_5.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Adding dropout with a rate of 0.2
model_5.add(Dropout(0.2))
# Adding a hidden layer with 16 neurons and relu as activation function
model_5.add(Dense(16, activation='relu'))
# Adding dropout with a rate of 0.2
model_5.add(Dropout(0.2))
# Adding hidden layer with 8 neurons and relu as activation function
model_5.add(Dense(8, activation='relu'))
# Adding the output layer with 1 neuron and sigmoid as activation function
model_5.add(Dense(1, activation='sigmoid'))
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam()
# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
model_5.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 384
dropout (Dropout) (None, 32) 0
dense_1 (Dense) (None, 16) 528
dropout_1 (Dropout) (None, 16) 0
dense_2 (Dense) (None, 8) 136
dense_3 (Dense) (None, 1) 9
=================================================================
Total params: 1057 (4.13 KB)
Trainable params: 1057 (4.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Fitting the ANN
history_5 = model_5.fit(
X_train_smote, y_train_smote,
batch_size=32, # Specify the batch size to use
epochs=50, # Specify the number of epochs
verbose=1,
validation_data=(X_val, y_val)
)
Epoch 1/50 319/319 [==============================] - 2s 3ms/step - loss: 0.6299 - recall: 0.6703 - val_loss: 0.5559 - val_recall: 0.6595 Epoch 2/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5768 - recall: 0.7206 - val_loss: 0.5516 - val_recall: 0.6963 Epoch 3/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5579 - recall: 0.7359 - val_loss: 0.5347 - val_recall: 0.6656 Epoch 4/50 319/319 [==============================] - 2s 5ms/step - loss: 0.5512 - recall: 0.7382 - val_loss: 0.5276 - val_recall: 0.6748 Epoch 5/50 319/319 [==============================] - 1s 4ms/step - loss: 0.5407 - recall: 0.7416 - val_loss: 0.5338 - val_recall: 0.6871 Epoch 6/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5278 - recall: 0.7382 - val_loss: 0.5090 - val_recall: 0.6779 Epoch 7/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5158 - recall: 0.7422 - val_loss: 0.5007 - val_recall: 0.6656 Epoch 8/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5064 - recall: 0.7473 - val_loss: 0.4899 - val_recall: 0.6718 Epoch 9/50 319/319 [==============================] - 1s 3ms/step - loss: 0.5012 - recall: 0.7433 - val_loss: 0.5126 - val_recall: 0.6963 Epoch 10/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4952 - recall: 0.7490 - val_loss: 0.4744 - val_recall: 0.6595 Epoch 11/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4888 - recall: 0.7551 - val_loss: 0.4944 - val_recall: 0.7117 Epoch 12/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4814 - recall: 0.7649 - val_loss: 0.4954 - val_recall: 0.7178 Epoch 13/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4735 - recall: 0.7688 - val_loss: 0.4636 - val_recall: 0.6871 Epoch 14/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4730 - recall: 0.7732 - val_loss: 0.4943 - val_recall: 0.7178 Epoch 15/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4640 - recall: 0.7745 - val_loss: 0.4973 - val_recall: 0.7423 Epoch 16/50 319/319 [==============================] - 1s 4ms/step - loss: 0.4596 - recall: 0.7818 - val_loss: 0.4777 - val_recall: 0.7147 Epoch 17/50 319/319 [==============================] - 1s 4ms/step - loss: 0.4555 - recall: 0.7783 - val_loss: 0.4700 - val_recall: 0.7270 Epoch 18/50 319/319 [==============================] - 2s 5ms/step - loss: 0.4517 - recall: 0.7920 - val_loss: 0.4435 - val_recall: 0.7025 Epoch 19/50 319/319 [==============================] - 1s 4ms/step - loss: 0.4489 - recall: 0.7875 - val_loss: 0.4508 - val_recall: 0.7178 Epoch 20/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4461 - recall: 0.7920 - val_loss: 0.4500 - val_recall: 0.7025 Epoch 21/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4425 - recall: 0.7918 - val_loss: 0.4570 - val_recall: 0.7362 Epoch 22/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4413 - recall: 0.7967 - val_loss: 0.4402 - val_recall: 0.6994 Epoch 23/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4348 - recall: 0.7961 - val_loss: 0.4665 - val_recall: 0.7515 Epoch 24/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4347 - recall: 0.8010 - val_loss: 0.4572 - val_recall: 0.7331 Epoch 25/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4314 - recall: 0.8000 - val_loss: 0.4512 - val_recall: 0.7209 Epoch 26/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4292 - recall: 0.8083 - val_loss: 0.4429 - val_recall: 0.7178 Epoch 27/50 319/319 [==============================] - 1s 5ms/step - loss: 0.4320 - recall: 0.8012 - val_loss: 0.4402 - val_recall: 0.7178 Epoch 28/50 319/319 [==============================] - 2s 5ms/step - loss: 0.4207 - recall: 0.8061 - val_loss: 0.4518 - val_recall: 0.7270 Epoch 29/50 319/319 [==============================] - 2s 6ms/step - loss: 0.4299 - recall: 0.8002 - val_loss: 0.4591 - val_recall: 0.7454 Epoch 30/50 319/319 [==============================] - 1s 5ms/step - loss: 0.4251 - recall: 0.7996 - val_loss: 0.4568 - val_recall: 0.7423 Epoch 31/50 319/319 [==============================] - 2s 5ms/step - loss: 0.4229 - recall: 0.8012 - val_loss: 0.4740 - val_recall: 0.7577 Epoch 32/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4222 - recall: 0.8142 - val_loss: 0.4399 - val_recall: 0.7239 Epoch 33/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4196 - recall: 0.8026 - val_loss: 0.4578 - val_recall: 0.7577 Epoch 34/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4195 - recall: 0.8104 - val_loss: 0.4332 - val_recall: 0.7301 Epoch 35/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4169 - recall: 0.8106 - val_loss: 0.4494 - val_recall: 0.7362 Epoch 36/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4159 - recall: 0.8124 - val_loss: 0.4263 - val_recall: 0.6994 Epoch 37/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4180 - recall: 0.8065 - val_loss: 0.4497 - val_recall: 0.7423 Epoch 38/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4101 - recall: 0.8161 - val_loss: 0.4534 - val_recall: 0.7485 Epoch 39/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4118 - recall: 0.8191 - val_loss: 0.4498 - val_recall: 0.7485 Epoch 40/50 319/319 [==============================] - 1s 2ms/step - loss: 0.4134 - recall: 0.8173 - val_loss: 0.4558 - val_recall: 0.7515 Epoch 41/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4206 - recall: 0.8118 - val_loss: 0.4355 - val_recall: 0.7086 Epoch 42/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4136 - recall: 0.8165 - val_loss: 0.4463 - val_recall: 0.7423 Epoch 43/50 319/319 [==============================] - 2s 5ms/step - loss: 0.4085 - recall: 0.8146 - val_loss: 0.4440 - val_recall: 0.7577 Epoch 44/50 319/319 [==============================] - 1s 5ms/step - loss: 0.4123 - recall: 0.8134 - val_loss: 0.4547 - val_recall: 0.7423 Epoch 45/50 319/319 [==============================] - 1s 5ms/step - loss: 0.4092 - recall: 0.8216 - val_loss: 0.4396 - val_recall: 0.7393 Epoch 46/50 319/319 [==============================] - 1s 4ms/step - loss: 0.4095 - recall: 0.8238 - val_loss: 0.4428 - val_recall: 0.7546 Epoch 47/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4056 - recall: 0.8208 - val_loss: 0.4420 - val_recall: 0.7546 Epoch 48/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4114 - recall: 0.8195 - val_loss: 0.4215 - val_recall: 0.7086 Epoch 49/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4089 - recall: 0.8226 - val_loss: 0.4358 - val_recall: 0.7239 Epoch 50/50 319/319 [==============================] - 1s 3ms/step - loss: 0.4108 - recall: 0.8189 - val_loss: 0.4379 - val_recall: 0.7423
Loss function
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
#Plotting Train recall vs Validation recall
plt.plot(history_5.history['recall'])
plt.plot(history_5.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
y_train_pred = model_5.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 3ms/step
array([[ True],
[False],
[False],
...,
[ True],
[ True],
[ True]])
y_val_pred = model_5.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
array([[False],
[False],
[False],
...,
[False],
[ True],
[ True]])
model_name = "NN with SMOTE,Adam & Dropout"
train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)
Classification report
cr=classification_report(y_train_smote,y_train_pred)
print("Classification Report of NN with SMOTE, Adam and Dropout on training set")
print(cr)
Classification Report of NN with SMOTE, Adam and Dropout on training set
precision recall f1-score support
0 0.85 0.83 0.84 5096
1 0.83 0.85 0.84 5096
accuracy 0.84 10192
macro avg 0.84 0.84 0.84 10192
weighted avg 0.84 0.84 0.84 10192
#classification report
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification Report of NN with SMOTE, Adam and Dropout on validation set")
print(cr)
Classification Report of NN with SMOTE, Adam and Dropout on validation set
precision recall f1-score support
0 0.93 0.82 0.87 1274
1 0.51 0.74 0.61 326
accuracy 0.80 1600
macro avg 0.72 0.78 0.74 1600
weighted avg 0.84 0.80 0.82 1600
Confusion matrix
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
Key Observations and Takeaways for NN with SMOTE, Adam Optimizer, and Dropout
Model Loss
Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss also decreases but with more fluctuations compared to the training loss, suggesting some variability in performance on the validation set. Model Recall
Training Recall: The training recall improves steadily, indicating the model's increasing ability to correctly identify positive cases in the training set. Validation Recall: The validation recall shows improvement but fluctuates significantly, indicating variability in the model's performance on the validation data. Confusion Matrix (Training Data)
True Negatives (TN): 4220 (41.41%) False Positives (FP): 876 (8.59%) False Negatives (FN): 758 (7.44%) True Positives (TP): 4338 (42.56%) Confusion Matrix (Validation Data)
True Negatives (TN): 1043 (65.19%) False Positives (FP): 231 (14.44%) False Negatives (FN): 84 (5.25%) True Positives (TP): 242 (15.12%) Classification Report (Training Data)
Precision: 0.85 (Class 0), 0.83 (Class 1) Recall: 0.83 (Class 0), 0.85 (Class 1) F1-Score: 0.84 (Class 0), 0.84 (Class 1) Accuracy: 0.84 Classification Report (Validation Data)
Precision: 0.93 (Class 0), 0.51 (Class 1) Recall: 0.82 (Class 0), 0.74 (Class 1) F1-Score: 0.87 (Class 0), 0.61 (Class 1) Accuracy: 0.80 Key Takeaways
Improvement Over Previous Models: The inclusion of SMOTE, Adam Optimizer, and Dropout layers has generally improved the model's ability to generalize, as seen by the relatively stable validation loss and improved recall on the validation set compared to models without these techniques. Recall: The recall for the positive class (churn) is significantly improved compared to earlier models, indicating that the model is better at identifying positive cases, though there is still room for improvement. Precision: The precision for the positive class is lower than the negative class, suggesting a higher number of false positives, which could lead to unnecessary retention efforts. Overall Performance: The model shows good overall performance with an accuracy of 0.80 on the validation set. The inclusion of dropout has helped to control overfitting, as indicated by the training and validation loss curves. Variability in Validation Metrics: The fluctuations in validation recall and loss suggest that the model's performance on unseen data can vary, indicating potential areas for further model tuning and improvement. By incorporating SMOTE, the Adam optimizer, and dropout, the model has become more robust to overfitting and has shown improved recall for the positive class. However, further efforts are needed to balance precision and recall, particularly for the positive class.
print("Training performance comparison")
train_metric_df
Training performance comparison
| recall | |
|---|---|
| NN with SGD | 0.128834 |
| NN with Adam | 0.597393 |
| NN with Adam & Dropout | 0.555215 |
| NN with SMOTE & SGD | 0.734694 |
| NN with SMOTE & Adam | 0.848312 |
| NN with SMOTE,Adam & Dropout | 0.851256 |
print("Validation set performance comparison")
valid_metric_df
Validation set performance comparison
| recall | |
|---|---|
| NN with SGD | 0.092025 |
| NN with Adam | 0.457055 |
| NN with Adam & Dropout | 0.481595 |
| NN with SMOTE & SGD | 0.677914 |
| NN with SMOTE & Adam | 0.613497 |
| NN with SMOTE,Adam & Dropout | 0.742331 |
train_metric_df - valid_metric_df
| recall | |
|---|---|
| NN with SGD | 0.036810 |
| NN with Adam | 0.140337 |
| NN with Adam & Dropout | 0.073620 |
| NN with SMOTE & SGD | 0.056780 |
| NN with SMOTE & Adam | 0.234815 |
| NN with SMOTE,Adam & Dropout | 0.108925 |
Based on the results provided, the best model can be determined by looking at the recall on both the training and validation sets, as well as the difference between training and validation recalls to assess overfitting.
Here are the key points from the data:
Training Set Recall: Higher values indicate better performance on the training set. Validation Set Recall: Higher values indicate better generalization to unseen data. Difference between Training and Validation Recall: Smaller differences indicate less overfitting. Training Set Recall NN with SGD: 0.128834 NN with Adam: 0.597393 NN with Adam & Dropout: 0.555215 NN with SMOTE & SGD: 0.734694 NN with SMOTE & Adam: 0.848312 NN with SMOTE, Adam & Dropout: 0.851256 Validation Set Recall NN with SGD: 0.092025 NN with Adam: 0.457055 NN with Adam & Dropout: 0.481595 NN with SMOTE & SGD: 0.677914 NN with SMOTE & Adam: 0.613497 NN with SMOTE, Adam & Dropout: 0.742331 Difference between Training and Validation Recall NN with SGD: 0.036810 NN with Adam: 0.140337 NN with Adam & Dropout: 0.073620 NN with SMOTE & SGD: 0.056780 NN with SMOTE & Adam: 0.234815 NN with SMOTE, Adam & Dropout: 0.108925 Analysis Highest Validation Recall:
The highest recall on the validation set is from the NN with SMOTE, Adam & Dropout model (0.742331). Balanced Performance:
While NN with SMOTE, Adam & Dropout has a high recall on both training and validation sets, its difference between training and validation recall (0.108925) is acceptable, indicating reasonable generalization without excessive overfitting. Low Overfitting:
NN with SMOTE & SGD also shows good performance with a validation recall of 0.677914 and a small difference between training and validation recalls (0.056780). Considering the combination of high recall on the validation set and a reasonably small difference between training and validation recall, the NN with SMOTE, Adam & Dropout model appears to be the best overall.
Conclusion NN with SMOTE, Adam & Dropout is the best model based on these results due to its highest validation recall and balanced performance between training and validation sets.
y_test_pred = model_5.predict(X_test) ## Complete the code to specify the best model
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 1ms/step [[False] [False] [False] ... [ True] [False] [False]]
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
precision recall f1-score support
0 0.93 0.81 0.86 1593
1 0.50 0.74 0.60 407
accuracy 0.79 2000
macro avg 0.71 0.78 0.73 2000
weighted avg 0.84 0.79 0.81 2000
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)
Observations and Key Takeaways for Model 5 NN with SMOTE, Adam & Dropout on Test Data
Observations Confusion Matrix:
True Negatives (TN): 1286 (64.30%) False Positives (FP): 307 (15.35%) False Negatives (FN): 104 (5.20%) True Positives (TP): 303 (15.15%) Classification Report:
Class 0 (Not Churn): Precision: 0.93 Recall: 0.81 F1-Score: 0.86 Support: 1593 Class 1 (Churn): Precision: 0.50 Recall: 0.74 F1-Score: 0.60 Support: 407 Overall Performance:
Accuracy: 0.79 Macro Average: Precision: 0.71 Recall: 0.78 F1-Score: 0.73 Weighted Average: Precision: 0.84 Recall: 0.79 F1-Score: 0.81 Key Takeaways Precision and Recall:
The precision for class 0 is high (0.93), indicating that most of the predicted not churn instances are correct. The recall for class 1 is relatively high (0.74), meaning the model is effective in identifying a significant number of actual churn instances. Balanced Performance:
The model shows a balanced performance between precision and recall for both classes, with a slight bias towards not missing churn cases, as indicated by the higher recall for class 1. However, the precision for class 1 (0.50) is lower, indicating a higher number of false positives for churn predictions. F1-Score:
The F1-Score for class 1 is 0.60, which suggests that there is still room for improvement in terms of balancing precision and recall for churn predictions. Overall Accuracy:
An overall accuracy of 0.79 on the test set demonstrates the model's effectiveness in predicting both churn and not churn instances, but with a noticeable number of misclassifications. Macro vs. Weighted Averages:
The macro average recall (0.78) being slightly higher than the accuracy (0.79) indicates that the model is fairly consistent across both classes. The weighted averages show a strong overall performance, especially considering the higher number of not churn cases. Conclusion The NN with SMOTE, Adam & Dropout model demonstrates solid performance on the test data, particularly with a high recall for churn cases, which is crucial for minimizing missed churn predictions. Despite some false positives, the model maintains a good balance and can be considered effective for practical application in churn prediction. Further tuning might be needed to enhance precision for churn predictions while maintaining or improving recall.
Actionable Insights and Business Recommendations
Actionable Insights High Recall for Churn Detection:
The model has a high recall for detecting churn cases, meaning it successfully identifies most of the customers who are likely to churn. This is crucial for proactive retention strategies. Precision Trade-off:
While the recall for churn is high, the precision is moderate. This indicates that while many churners are correctly identified, there are also a number of false positives. This means some customers predicted to churn may not actually do so. Overall Model Performance:
The model’s overall accuracy and balanced performance suggest it is reliable for churn prediction, but there is still room for improvement, particularly in reducing false positives. Effective Handling of Imbalanced Data:
Using SMOTE to balance the training data has proven effective, resulting in a model that performs well across both churn and non-churn classes. Business Recommendations Proactive Retention Strategies:
Utilize the model to identify at-risk customers and target them with retention strategies such as personalized offers, loyalty programs, or enhanced customer service. Focus on those predicted to churn to prevent actual churn. Resource Allocation:
Given the moderate precision, consider a tiered approach to resource allocation. High-risk churners (high model confidence) should receive immediate and significant attention, while lower-risk cases (lower model confidence) can receive less intensive interventions. Further Model Tuning:
Continue to refine the model to improve precision while maintaining high recall. This could involve exploring additional features, fine-tuning hyperparameters, or employing different balancing techniques. Customer Feedback Loop:
Implement a feedback loop where the outcomes of retention efforts are tracked and fed back into the model. This can help in continually improving the model’s accuracy and effectiveness. Segment Analysis:
Perform a deeper analysis of different customer segments. Understanding which segments are more likely to churn and why can help tailor specific retention strategies to different groups. Communication Strategy:
Develop a communication strategy to address false positives. For customers who are incorrectly predicted to churn, ensure that retention efforts do not appear unnecessary or intrusive. Monitoring and Reporting:
Regularly monitor the model’s performance and impact on churn rates. Set up dashboards and reports to track key metrics, including the number of churners detected, retention rates, and overall model accuracy. A/B Testing:
Implement A/B testing for different retention strategies based on model predictions. This can help identify the most effective approaches and further refine intervention tactics. By implementing these actionable insights and business recommendations, the company can leverage the predictive model to effectively reduce customer churn, enhance customer satisfaction, and ultimately improve overall business performance.
Power Ahead
y_test_pred = model_0.predict(X_test) ## Complete the code to specify the best model
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 1ms/step [[False] [False] [False] ... [False] [False] [ True]]
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
precision recall f1-score support
0 0.79 0.77 0.78 1593
1 0.18 0.19 0.19 407
accuracy 0.65 2000
macro avg 0.48 0.48 0.48 2000
weighted avg 0.66 0.65 0.66 2000
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)
import seaborn as sns
import matplotlib.pyplot as plt
# Define the features to analyze
features = ['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']
# Create scatter plots with trend lines
for feature in features:
plt.figure(figsize=(10, 6))
sns.regplot(x=ds[feature], y=ds['Exited'], scatter_kws={'alpha':0.5}, line_kws={"color":"red"})
plt.title(f'Scatter plot of {feature} vs Exited with Trend Line')
plt.xlabel(feature)
plt.ylabel('Exited')
plt.show()
from sklearn.preprocessing import PolynomialFeatures
# Select features for polynomial transformation
X = ds[['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']]
y = ds['Exited']
# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
# Display polynomial feature names
poly.get_feature_names_out(input_features=X.columns)
array(['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary',
'CreditScore^2', 'CreditScore Age', 'CreditScore Tenure',
'CreditScore Balance', 'CreditScore EstimatedSalary', 'Age^2',
'Age Tenure', 'Age Balance', 'Age EstimatedSalary', 'Tenure^2',
'Tenure Balance', 'Tenure EstimatedSalary', 'Balance^2',
'Balance EstimatedSalary', 'EstimatedSalary^2'], dtype=object)
sns.pairplot(ds[['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary', 'Exited']], hue='Exited')
plt.show()
No Clear Pattern
import pandas as pd
from sklearn.metrics import recall_score
# Initialize empty dataframes to store recall values
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])
def train_and_evaluate_model(model_name, model, X_train, y_train, X_val, y_val):
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=1, validation_data=(X_val, y_val))
# Predict on training set
y_train_pred = model.predict(X_train)
y_train_pred = (y_train_pred > 0.5).astype(int)
train_recall = recall_score(y_train, y_train_pred)
# Predict on validation set
y_val_pred = model.predict(X_val)
y_val_pred = (y_val_pred > 0.5).astype(int)
val_recall = recall_score(y_val, y_val_pred)
# Update the dataframes
train_metric_df.loc[model_name] = train_recall
valid_metric_df.loc[model_name] = val_recall
# Example models (assuming these are defined correctly elsewhere in your code)
model_sgd = ...
model_adam = ...
model_adam_dropout = ...
model_smote_sgd = ...
model_smote_adam = ...
model_smote_adam_dropout = ...
# Train and evaluate each model
train_and_evaluate_model("NN with SGD", model_sgd, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with Adam", model_adam, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with Adam & Dropout", model_adam_dropout, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with SMOTE & SGD", model_smote_sgd, X_train_smote, y_train_smote, X_val, y_val)
train_and_evaluate_model("NN with SMOTE & Adam", model_smote_adam, X_train_smote, y_train_smote, X_val, y_val)
train_and_evaluate_model("NN with SMOTE, Adam & Dropout", model_smote_adam_dropout, X_train_smote, y_train_smote, X_val, y_val)
print("Training performance comparison")
print(train_metric_df)
print("Validation set performance comparison")
print(valid_metric_df)